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10 ^ T^' 'r"""^"" ^" ^"'^ govemment suppon under EY-09404 

10 awarded by U.e Nadonai Institutes of Health. The U.S. Government has cJrtal 
nghts m the invention. 

Cross-Reference to Related Applications: 

Field of the Invention: 

This invention relates to human galactokinase and the idenriHcadon of 
galactolanase mutations, a missense and nonsense, as well as isolated nucleic acids 
encoding same, recombinant host cell transfonncd with DNA encoding sucT 
proteins and to uses of the expressed proteins and nucleic acid sequences in 
therapeutic and diagnostic applications. 

Background of the Invention: 

recessiv? H "T" '"""^ "^'^'""'^ -hich are 

recess ve. Many have devastating effects that may include a combination of several 

chnical features, such as severe menta. retardation, impairment of the peripheT 

nervous system, blindness, hearing deficiency and organomeg^y. Most of the 

30 ^gs "'^"'^^ ""''^'^^'^ ^'---^ ^ treated by 

other ^"''^'^^^^^ '^^^'"'^"^y °- °^ three known fonns of galactosemia. The 
other fornis are galactose- 1 -phosphate uridylo-ansferase deficiency and UDP- 
galactose-4-epimerase deficiency. All three enzymes are involved in galactose 
metabolism, i.e.. the conversion of galactose to glucose in the body. Galactokinase 
deficiency is inherited as an autosomal recessive trait with a heterozygote frequencv 
esnmated to be 0.2% in the general population (see. e.g.. Levy et al J. ' 
22:871-877 (1978)). Patients with homozygous galactokinase deficiency usuallv 
become symptomatic in the early infantile period showing galactosemia 
galactosura. increased galacntol levels, cataracts and in a few cases, mental 
retardation (Segal et al.. LPs^lL., 25:750-752 (1979)). These symptoms usually 
improve dramatically with the administration of a galactose free diet 
Heterozygotes for galactokinase deficiency are prone to presenile cataracts with the 
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5 onset during 20-50 years of age (Stambolian et al.. Invest. Ophthal. Vis. Sci. . 
22:429-433(1986)). 

Galactokinasc activity has been found in a variety of mammalian tissues, 
including liver, kidney, brain, lens, placenta, erythrocytes and leukocytes. While 
the protein has been purified from £. coliy the purification of the protein from 

10 mammalian tissues has proven difficult due to its low cellular concentration. In 
addition, the molecular basis of galactokinase deficiency is unknown. 

This invention provides a human galactokinase gene. The DNAs of this 
invention, such as the specific sequences disclosed herein, are useful in that they 
encode the genetic information required for expression of this protein. Additionally, 

15 the sequences may be used as probes in order to isolate and identify additional 

members, of the family, type and/or subtype as well mutations which may form the 
basis of galactokinase deficiency which may be characterized by site-specific 
mutations or by atypical expression of the galactokinase gene. The galactokinase 
gene is also useful as a diagnostic agent to identify mutant galactokinase proteins or 

20 as a therapeutic agent via gene therapy. 

The first clinical trials of gene therapy began in 1990. Since that time, 
more than 70 clinical trial protocols have been reviewed and approved by a 
regulatory authority such as the NTH's Recombinant Advisory Conunittee (RAC), 
see, e.g., Anderson, W. F., Human Gene Therapy . 1:281-282 (1994). The 

25 therapeutic treatment of diseases and disorders by gene therapy involves the transfer 
and stable insertion of new genetic information into cells. The correction of a 
genetic defect by re-introduction of the normal allele of a gene has hence 
demonstrated that this concept is clinically feasible (see, e.g., Rosenberg et al.. New 
Eng. J. Med. . 222: 570 (1990)). 

30 These and additional uses for the reagents described herein will become 

apparent to those of ordinary skill in the an upon reading this specification. 

Summary of the Invention: 

This invention provides isolated nucleic acid molecules encoding human 
35 galactokinasc, as well as nucleic acid molecules encoding missense and nonsense 
mutations, which includes mRNAs, DNAs (e.g., cDNA, genomic DNA, etc.), as 
well as antisense analogs thereof and diagnostically or therapeutically useful 
fragments thereof 

This invention also provides recombinant vectors, such as cloning and 
40 expression plasmids useful as reagents in the recombinant production of human 

2 
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5 galactokinase proteins, as well as recombinant prokaiyotic ancVor eukaryotic host 
cells comprising a human galactokinase nucleic acid sequence. 

This invention also provides a process for preparing human galactokinase 
proteins which comprises culturing recombinant prokaiyotic and/or eukaryotic host 
cells, containing a human galactokinase nucleic acid sequence, under conditions 
promotmg expression of said protein and subsequent recovery thereof of said 
protem. Another related aspect of this invention is isolated human galactokinase 
protems produced by said method. In yet another aspect, this invention also 
provides antibodies that are directed to (i.e.. bind) human galactokinase prx>teins 

^'^'"^^""O'^^so provides an isolated human galactokinase proteins 
having a missense or nonsense mutation and antibodies (monoclonal or polyclonal) 
that are specifically reactive with said proteins. 

This invention also provides nucleic acid probes and PCR primers 
comprising nucleic acid molecules of sufficient length to specifically hybridize to 
numan galactokinase sequences. 

TTiis invention also provides a method to diagnose human galactokinase 
deficiency which comprises isolating a nucleic acid sample fn,m an individual and 
assaying the sequence of said nucleic acid sample with the reference gene of the 
invention and comparing differences between said sample and the nucleic acid of 
the mstant invention, wherein said differences indicate mutations in the human 
galactokinase gene isolated from an individual. The sample can be assayed by 
direct sequence comparison (i.e.. DNA sequencing), wherein the sample nucleic 
acd can be compared to the reference galactokinase gene, by hybridization (e g 
mobihty shift assays such as heteroduplex gel electrophoresis. SSCP or other 
techniques such as Northern or Southern blotting which are based upon the length of 
the nucleic acid sequence) or other known gel electrophoresis methods such as 
^frr n^.?'';?' «^ndonuclease digestion of a sample amplified by 

PCR(forDNA)orPCR-RT(forRNA)). Alternatively, the diagnostic method 
comprises isolating cells from an individual containing genomic DNA and assaying 
said sample (e.g., ceUular RNA) by in situ hybridization using the DNA sequence of 
the mvenaon. or at least one exon. or a fragment containing at least 15. preferably 
18. and more preferably 21 contiguous base pairs as a probe. This invention also 
provides an antisense oligonucleotide having a sequence capable of binding with 
mRNAs encoding human galactokinase so as to identify mutant galactokinase genes 

This invention also provides yet another method to diagnose human 
galactokinase deficiency which comprises obtaining a serum or tissue sample- 
allowing such sample to come in contact with an antibody or antibody fragment 



3 



SDOCIO-. <WO 9600n7dAl I 



wo 96/09374 



PCTAJS95/06743 



5 which specifically binds to a mutant human galactokinasc protein of the invention 
under conditions such thai an antigen-antibody complex is formed between said 
antibody (or antibody fragment) and said mutant galactokinase protein; and 
detecting the presence or absence of said complex. 

This invention also provides transgenic non-human animals comprising a 

10 nucleic acid molecule encoding human galactokinase. Also provided are methods 
for use of said transgenic animals as models for disease sutes, mutation and S AR. 

This invention also provides a method for treating conditions which are 
related to insufficient human galactokinase activity which comprises administering to 
a patient in need thereof a pharmaceutical composition containing the galactokinase 

15 protein of the invention which is effective to supplement a patient's endogenous 
galactokinase and thereby alleviating said condition. 

This invention also provides a method for treating conditions which are 
related to insufficient human galactokinase activity via gene therapy. An additional, 
or reference* gene comprising the non-mutant galactokinase gene of the instant 

20 invention is inserted into a patient's cells either in vivo or ex vivo. The reference 
gene is expressed in transfected cells and as a result, the protein encoded by the 
reference gene corrects the defect (i.e., galactokinase deficiency) thus permitting the 
transfected cells to function normally and alleviating disease conditions (or 
symptoms). 

25 

Brief Description of the Drawings: 

Figure 1 depicts the intron/exon organization of the human galactokinase 

gene. 

Figure 2 is the genomic DNA sequence (and single letter amino acid 
30 abbreviations) for human galactokinase (SEQ ID NO: 7]. The bolded DNA 

sequence corresponds to the exon regions whereas the normal or unbolded type 
corresponds to the intron regions of human galactokinase. 

Detailed Description of the Invention: 

35 This invention relates to human galactokinase (amino acid and nucleotide 

sequences) and its use as a diagnostic and therapeutic. The panicular cDNA and 
amino acid sequence of human galactokinase is identified by SEQ ID NO:4 as 
described more fully below. This invention also relates to the genomic DNA 
sequence for human galactokinase [SEQ ID NO: 7] and also to mutant human 

40 galactokinase genes and amino acid sequences [SEQ ID NO:5 and 6] and their use 
for diagnostic purposes. 
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5 In further describing the present invention, the following additional tenns 

will be employed, and are intended to be defined as indicated below. 

An "antigen" refers to a molecule containing one or more epitopes 
that will stimulate a host's immune system to make a humoral and/or cellular 
antigen-specific response. The term is also used herein interchangeably with 
10 "immunogen." 

The term "epitope" refers to the site on an antigen or hapten to which 
a specific antibody molecule binds. The term is also used herein interchangeably 
with "anngenic determinant" or "antigenic determinant site." 

A coding sequence is "operably linked to" another coding sequence 
when RNA polymerase will transcribe the two coding sequences into a single 
mRNA, which is then translated into a single polypeptide having amino acids 
derived from both coding sequences. The coding sequences need not be contiguous 
to one another so long as the expressed sequence is ultimately processed to produce 
the desired protein. 

"Recombinant" polypeptides refer to polypeptides produced by 
recombinant DNA techniques; i.e.. produced from cells transformed by an 
exogenous DNA construct encoding the desired polypeptide. "Synthetic" 
polypeptides are those prepared by chemical synthesis. 

A "replicon" is any genetic element (e.g., plasmid. chromosome 
virus) that functions as an autonomous unit of DNA replication in vIvq; i.e., capable 
of replication under its own control. 

A "vector" is a replicon. such as a plasmid. phage, or cosmid. to 
which another DNA segment may be attached so as to bring about the replication of 
the attached segment. 

A "replication-deficient virus" is a virus in which the excision and/or 
replication functions have been altered such that after transfection into a host cell, 
the virus is not able to reproduce and/or infect addition cells. 

A "reference" gene refers to the galactokinase sequence of the 
invention and is understood to include the various sequence polymorphisms that 
exist, wherein nucleotide substitutions in the gene sequence exist, but do not affect 
the essential function of the gene product. 

A "mutant" gene refers to galactokinase sequences different from the 
reference gene wherein nucleotide substitutions and/or deletions and/or insertions 
result in impairment of the essential function of the gene product such that the levels 
of galactose in an individual (or patient) are atypically elevated. For example, the G 
to A substitution at position 122 of human galactokinase [SEQ ID NO: 5] is a 
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5 misscnse mutation associated with patients who are galactokinase deficient Another 
T for G subsutution produces an in-frame nonsense codon at amino acid position 80 
of the mature protein. The result is a tnincated protein consisting of the first 79 
amino acids of human galactokinase. 

A DNA "coding sequence of or a "nucleotide sequence encoding" a 
particular protein, is a DNA sequence which is transcribed and translated into a 
polypeptide when placed under the control of appropriate regulatory sequences. 

A "promoter sequence" is a DNA regulatory region capable of 
binding RNA polymerase in a cell and initiating transcription of a downstream (3' 
direction) coding sequence. For purposes of defining the present invention the 
promoter sequence is bound at the 3' terminus by a translation stan codon (e g 
ATG) of a coding sequence and extends upstream (5' direction) to include the ' 
minimum number of bases or elements necessary to initiate transcription at levels 
detectable above background. Within the promoter sequence will be found a 
tianscnption initiation site (conveniently defined by mapping with nuclease Si) as 
well as protein binding domains (consensus sequences) responsible for the binding 
of RNA polymerase. Eukaryotic promoters will often, but not always, contain 
TATA" boxes and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgamo 
sequences in addition to the -10 and -35 consensus sequences. 

DNA "control sequences" refers collectively to promoter sequences 
nbosome binding sites, polyadenylation signals, transcription termination sequences 
upstream regulatory domains, enhancers, and the like, which collectively provide for 
the expression (i.e., the transcription and translation) of a coding sequence in a host 
cell. 

A control sequence "directs the expression" of a coding sequence in a 
cell when RNA polymerase will bind the promoter sequence and transcribe the 
coding sequence into mRNA, which is then translated into the polypeptide encoded 
by the coding sequence. 

A "host cell" is a cell which has been transformed or transfected or is 
capable of transformation or transfection by an exogenous DNA sequence. 

A cell has been "transformed" by exogenous DNA when such 
exogenous DNA has been introduced inside the cell membrane. Exogenous DNA 
may or may not be integrated (covalentiy linked) into chromosomal DNA making up 
the genome of the cell. In prokaryotes and yeasts, for example, the exogenous DNA 
may be maintained on an episomal element, such as a plasmid. With respect to 
eukaryotic cells, a stably transformed or transfected cell is one in which the 
exogenous DNA has become integrated into the chromosome so that it is inherited 

6 
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5 ^y d-ghter cells through ch:..osome replication. T.is stability is den^onstrated by 
the abxhty of the eukaryoric cell to establish cell lines or clones comprised of a 
population of daughter cell containing the exogenous DNA. 

unf • ^"^'^'''^'''^^''"^''''^'''^'''^^^^^ 

up foreign DNA and integrate that foreign DNA into their chromosome 

iTf f " Z --"P^-. by various techniques in which 

cells take up DNA (e.g.. calcium phosphate precipitation, electroporation 

15 cell types (or cell'ilZr ' ' ^''^^ '''' ' "^^"^^^ ^ 

A "clone" is a population of cells derived from a single ceU or 
common ancestor by mitosis. A "cell line" is a clone of a primary cell that is 
capable of stable growth in idim for many generations. 
20 cr.^ "heterologous" region of a DNA construct is an identifiable 

20 segment of DNA within or attached to another DNA molecule that is not found in 
ociatxon with die other molecule in nature. THus. when the heterologous .gi Jn 
encodes a gene, the gene will usually be flanked by DNA that does not flank Z 
gene in the genome of the source animal. Another example of a heterologous coding 
sequence is a construct where the coding sequence itself is not found in n!ni. (^ 

^-)- Allelic .JlL 

"^-^"^-^^ - - ^ — ^^gion of 

or . ^"''^^ '^'^^'^ insufficient human galactokinase activity" 

or a deficiency in galactokinase activity" means mutations of the galactokinase 
protein which affects galactokinase activity or may affect expression of 
galactokinase or both such that the levels of galactose in a patient are atypically 
elevated. In addition, this definition is intended to cover atypically low levels of 
g^actokmase expression in a patient due to defective control sequences for the 
rcterence galactokinase protein. 

This invention provides an isolated nucleic acid molecule encoding a human 
galactokinase protein and substantially similar sequences. Isolated nucleic acid 
sequences are "substantially similar" if: (i) they are approximately the same length 
at least 80% of the coding region of SEQ ID NO:4); (ii) they encode a proLn 
with the same (i.e.. within an order of magnitude) galactokinase activity as the 
prmem encoded by SEQ ID NO:4; and (iii) they are capable of hybridizing under 
moderately stnngent conditions to SEQ ID N0:4; or they encode DNA sequences 
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5 which are degenerate to SEQ ID N0:4. Degenerate DNA sequences encode the 
same amino acid sequence as SEQ ID NO:4. but have variation(s) in the nucleotide 
coding sequences. Hybridizadon under moderately stringent condirions is ouUined 
below. 

Hybridization under moderately stringent conditions can be performed as 
10 follows. NitroceUulose filters are prehybridized at 65'C in a solution containing 6X 
SSPE, 5X Denhandt's solution (lOg Ficoll. lOg BSA and lOg Polyvinylpyrrolidone 
per liter solution), 0.05% SDS and 100 micrograms tRNA. Hybridization probes are 
labeled, preferably radiolabelled (e.g., using the Bios TAG-IT® kit). Hybridization is 
then carried out for approximately 18 hours at 65*0. The filters arc then washed in a 
15 solution of 2X SSC and 0.5% SDS jut room temperature for 15 minutes (repeated 
once). Subsequently, the filters are washed at 58*C. air-dried and exposed to X-ray 
film overnight at -TO'C with an intensifying screen. 

Alternatively, "substantially similar" sequences are substantially the same 
when about 66% (preferably about 75%, and most preferably about 90%) of the 
20 nucleotides or amino acids match over a defined length (i.e., at least 80% of the 
coding region of SEQ ID NO:4) of the molecule and the protein encoded by such 
sequence has the same (i.e.. within an order of magnitude) galactokinase activity as 
the protein encoded by SEQ ID NO:4. As used herein, substantially similar refers to 
the sequences having similar identity to the sequences of the instant invention. Thus 
25 nucleotide sequences that are substantially the same can be identified by 

hybridization or by sequence comparison. Protein sequences that arc substantially 
the same can be identified by one or more of the following: proteolytic digestion, gel 
electrophoresis and/or microsequencing. 

This invention also provides isolated nucleic acid molecules encoding a 
30 missense mutation (SEQ ID N0:5) or a nonsense mutation (SEQ ID NO:6) of the 
human galactokinase protein and DNA sequences which are degenerate to SEQ ED 
NO:5 or 6. Degenerate DNA sequences encode the same amino acid (or termination 
site) sequence as SEQ ID NO:5 or 6, but have varianon(s) in the nucleotide coding 
sequences. 

35 One means for isolating a nucleic acid molecule encoding for a human 

galactokinase is to probe a human genomic or cDNA library with a natural or 
artificially designed probe using an recognized procedures (See for example: 
"Current Protocols in Molecular Biology", Ausubel, P.M., et al. (eds.) Greene 
Publishing Assoc. and John Wiley Interscience. New York. 1989,1992). It is 

40 appreciated to one skilled in the an that SEQ ID NO:4. or fragments thereof 
(comprising at least 15 contiguous nucleotides), is a paniculariy useful probe. 

8 
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5 Several particularly useful probes for this purpose are set fonh ■ x u. 

including oanscripdonal i,,„l„„ . '^'^ '"'"'T** ^"^ 

■5 other stability. T^^^^J^^ T"" " « 

g ons rejative to the coding sequences disclosed herein 
0 gene coTecrs .he defect (i e Lt. \ """"" """^ '«f'=«"« 

.ncdon ...a,. t,:ra:;e:;cr :::::r "> 

inn^uce the thetapeu,, (.a.a:ra:)r ^0^^"^^" " 
Moloney leukemia virus rMMl V> ic . „ . example, mouse 

-iais ,see. e.g.. Bot^s-U e^»t a ^""7 

, iijene et al.. CuiT (>„ grm Pry Tin- IQOiiijuL.i 

■""'"oecne therapy does not require isolation and T 
of pauents' cells. The thetapeuttc gene is typically >ckag dCar 
a paden, such as in liposomes or in a replicln-dXn 1 ucra"^"""" 
eg., Berkner, K.L., O^.^Ss^,U,,,ms>Un>mZ m lZ atZ'^' 
adeno-associated vin,s (a7v) vectot. seeTg. Z ^cl^^ ' " 

ia:97- ,29 (1992, and U.S. Patents 252 4^^v 
for Gene Therapy") Another u • . ^.^52,479 Safe Vector 

.n..chthe.h— ^^^^^^^^ 

..a-ocytrf:^::stt:;h~^^^^ — - 

an. endothelial cells. PreferThly L c2 :Z^^ '^y 

-P-.oty (or pulntonary, epithelial cells. TransUn oa^Ca!^^ e^hlal 



wo 96/09374 



PCT/US95/06743 



5 cells can occur via inhalation of a ncubulized preparation of DNA vectors in 

liposomes, DNA-protcin complexes or replication-deficient adenoviruses (see, e.g., 
U.S. Patent 5,240,846 "Gene Therapy Vector for Cystic Fibrosis". 

This invention also provides for a process to prepare human galactokinase 
proteins. Non-mutant proteins are defined with reference to the amino acid sequence 

10 listed in SEQ ID N0:4 and includes variants with a substantially similar amino acid 
sequence that have the same galactokinase activity. Additional proteins of this 
invention include mutant human galactokinase proteins as set forth in SEQ ID NO: 5 
or 6. The proteins of this invention are preferably made by recombinant genetic 
engineering techniques. The isolated jiucleic acids particularly the DNAs can be 

15 introduced into expression vectors' by operatively linking the DNA to the necessary 
expression control regions (e.g., regulatory regions) required for gene expression. 
The vectors can be introduced into the appropriate host cells such as prokaryotic 
(e.g., bacterial), or eukaryotic (e.g., yeast or mammalian) cells by methods well 
known in the art (Ausubel et al., supra ^ The coding sequences for the desired 

20 proteins having been prepared or isolated, can be cloned into any suitable vector or 
replicon. Numerous cloning vectors are known to those of skill in the art, and the 
selection of an appropriate cloning vector is a matter of choice. Examples of 
recombinant DNA vectors for cloning and host cells which they can transform 
include, but is not limited to, the bacteriophage X (E- coli ). pBR322 (£. £q1D. 

25 pACYClT? (E. coliV pKT230 (gram-negative bacteria), pGVl 106 (gram-negative 
bacteria), pLAFRl (gram-negaiive bacteria), pME290 (non-E- coli gram-negative 
bacteria), pHV14 (£. COli and Bacillus subtilis ). pBD9 (Eaciiks), pU61 
(Srreptomvces l pUC6 fStreptomvces ). YIp5 (SacCimmmxCfiS), a baculovirus insect 
cell system, a Drosophila insect system, and YCpl9 (SaccharomvcesV generallv . 

30 "DNA Cloning": Vols. I & D. Glover £i al- ed. IRL Press Oxford (1985) (1987) and; 
T. Maniatis ("Molecular Cloning" Cold Spring Harbor Laboratory (1982). 

The gene can be placed under the control of a promoter, ribosome 
binding site (for bacterial expression) and, optionally, an operator (collectively 
referred to herein as "control" elements), so that the DNA sequence encoding the 

35 desired protein is transcribed into RNA in the host cell transformed by a vector 
containing this expression construction. The coding sequence may or may not 
contain a signal peptide or leader sequence. The subunit antigens of the present 
invenjion can be expressed using, for example, the £. coli tac promoter or the protein 
A gene (spa) promoter and signal sequence. Leader sequences can be removed by the 

40 bacterial host in post-translational processing. Ssfi. S.^. U.S. Patent Nos. 4.431,739; 
4,425,437; 4,338,397. 

10 



NSDOCID: <WO 9609374A1 _t. > 



wo 96/09374 

PCT/US95/06743 

5 In addidon lo control seqaences, i, may be desirable ■» add regulaioo^ 

s«,ue«=es Which allow forreguUdon of ,he expression of U,e prntein sequences 
relaave „ U,e growh of d,c hos. cell. R=g„,a,o,y sequences are known ,o .hose of 
^1 .n d,e ar,, and examples include ,hose which cause d,e expression of a gene ,o be 

10 !r r ' " •"""■'^^ '"<^'"*"8 "h^ presence 

10 ofaregula»,ycompound. Other types of regulatory elements may also be p,esen, in 
tfte vector, for example, enhancer sequences. 

An expression vector is constructed so that the particular coding 
sequence is located in the vector with the appn>priate regulatory sequences the 
positioning and orientation of the coding sequence with respect to the cona^ol 
sequences being such that the coding-sequence is ti^scribed under the "control" of 

conZr "'•^'^ ^'"^^ ^NA molecule at the 

enc^ T"" "^^"^^^ -^-n-)- Modification of the sequences 

encoding the particular antigen of interest may be desirable to achieve thisTnd For 
example, in some cases it may be necessary to modify the sequence so that it may be 

the reading frame. The control sequences and other regulatory sequences may be 
ligated to the coding sequence prior to insertion into a vector, such as the cloning 
vectors descnbed above. Alternatively, the coding sequence can be cloned directly 

^^°^«P^«'on vector which already contains the control sequences and an 
appropnate restriction site. 

of the . " ^"^'"^^'^ '° P'"^^"^^ "'"^^'^ or analogs 

pLtn ofT """"" " '"^^ ~ Of a 

ponion of the sequence encoding the protein, by insertion of a sequence, and/or by 

subsmudon of one or more nucleotides within the sequence. Techniques for 

^l^uTT' "c"'""' " ^^"■'''^"^'^ - •-own to 

n suorN T ^' ""^'^'^ " ^"^"^ ^^i^^. Vols. 1 and 

supra; Nucl^^ir Af>.^ ^Yhlflirnrinn supra. 

A number of prokaryotic expression vectors are known in the art 
U.S. Patent Nos. 4.578.355; 4.440.859; 4,436.815; 4.431.740; 4.431 739- 
4 28,941; 4.425,437; 4.418.149; 4.411.994; 4.366.246; 4.342,832; ^ ^ U.K 
Patent Applications GB 2,121.054; GB 2.008.123; GB 2.007.675; and European ' 
Patent Application 103,395. Yeast expression vectors are also known in thel ^, 

U.S Patent Nos. 4,446.235; 4.443.539; 4.430.428; ^ ^1^ European Patent 
Apphcauons 103.409; 100.561; 96,491. pSV2neo (as described in J. Mo, An.. 
Ckim. 1:327-341) which uses the SV40 late promoter to drive expressVonin 
mammalian cells orpCDNAlneo. a vector derived from pCDNAl (Mol. THI i.^^ ] 
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5 7:4125-29) which uses the CMV promoter to drive expression. Both these latter two 
vectors can be employed for transient or stable (using G418 resistance) expression in 
mammalian cells. Insect cell expression systems, e.g., Drosophila . are also useful, see 
for example, PCT applications WO 90/06358 and WO 92/06212 as well as EP 
290,261-Bl. 

10 Depending on the expression system and host selected, the proteins of 

the present invention are produced by growing host cells transformed by an 
expression vector described above under conditions whereby the protein of interest is 
expressed. Preferred mammalian cells include human embryonic kidney cells, monkey 
kidney (HEK-293cells), fibroblast (COS) cells, Chinese hamster ovary (CHO) cells, 

15 Drosophila or murine L-cells. If the expression system secretes the protein into 
growth media, the protein can be purified direcdy from the media. If the protein is 
not secreted, it is isolated from cell ly sates or recovered from the cell membrane 
fraction. The selection of the appropriate growth conditions and recovery methods 
are within the skill of the art. 

20 An alternative method to identify proteins of the present invention is 

by constructing gene libraries, using the resulting clones to transform coli and 
pooling and screening individual colonies using polyclonal semm or monoclonal 
antibodies to galactokinase. 

The proteins of the present invention may also be produced by 

25 chemical synthesis such as solid phase peptide synthesis, using known amino acid 
sequences or antino acid sequences derived from the DNA sequence of the genes of 
interest Such methods are known to those skilled in the art. Chemical synthesis of 
peptides is not panicularly preferred. 

The proteins of the present invention or their fragments comprising at 

30 least one epitope can be used to produce antibodies, both polyclonal and monoclonal. 
If polyclonal antibodies are desired, a selected mammal, (e.g., mouse, rabbit, goat, 
horse, etc.) is immunized with the protein of the present invention, or a fragment 
thereof, capable of eliciting an immune response (i.e., having at least one epitope). 
Serum from the immunized animal is collected and treated according to known 

35 procedures. If scrum containing polyclonal antibodies is used, the polyclonal 
antibodies can be purified by immunoaffmity chromatography or other known 
procedures. 

Monoclonal antibodies to the proteins of the present invention, and to 
the fragments thereof, can also be readily produced by one skilled in the art. The 
40 general methodology for making monoclonal antibodies by using hybridoma 

technology is well known. Immortal antibody-producing cell lines can be created by 
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,0 ! .L?"^^- ''.3!^.I2I; 4,427.783- 4 444 88^ 

10 4.452370; 4.466,917; 4.472J00; 4,491.632; Md 4 493 890 I^J1> ,' 

^n«. for vanou. propcnies; i..., for iso,yp=. epi,ope. am„i„, «c. Hence one 
skilled in the m can produce monoclonal anHbodies specificallvr,,., I 

5 ^ution Of SEQ ID N0:6. Mo„«;l6„al andbodies are usef^in^nT 

ij™— 1„ ^,„,„es. Of .he ind.vid.1 and.„rwr::r:L":-aZs. 

Ahemanvely. genes encoding ,he nK>n«:lonals of in,e„,s. may L i^ZZ^T 
hybndomas by PCR ,echni,„es known in ,he ar. and cloned /nd e 7™!*' 
appropna,e vector.. TT,e a„,i Wies o, UUs invenrton, whether polycloX 
monoclonal have additional utUity in that they may be employed 4el ° 
.mmunoassays. RIA, ELISA. and the like. As used herein "InT , 
understood to include andbodies derived J TsZlu^TT" ''T'" 

32.:522 (1986); Verhoeyen et a,., S^. 239:1.34 (1988); Kabat et^lT^' 
147:1709 (1991,; Queen e, a,.,^:,,.^!,,^.,,,^^. ."^^T^^^^, 
( 989); Gorman e, al., P^c a.,. ' '^^ ™» 

al.. toOidmata. 9:421 (1991, Therefore, this tnvention L 12^1 

polyclonal or monclonal (including chimertc a«, •■humani^^tlted 

o epitopes corresponding to amino acid sequences disclosed hetein frothu^ 

I^acokmase. Methods for the production of polyclonal and monoclon^ a^Zues 

a« weU known, see for example Chap. 1 1 „, Ausubel et al. (supra) 

When d,e andbody is labeled with an analydcally detectable teagen, such a 
tadtoacnvty. fluore^ence. or an enzyme, the antibody can be use to de« "e 
presence or absence of human galactokinase and/or its quandtadve level In addition 
— es (polyclonal ormonoclonal, specific for the missense and nin e.^, " 
mutanons of the ptesent invention arc useful for diagnosdc purposes. A s^m or 
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5 tissue sample (e.g.. liver, lung, etc.) is obtained and allowed to come in contact with 
an antibody or antibody fragment which specifically binds to a mutant human 
galactokinasc protein of the invention under conditions such that an antigen- 
antibody complex is formed between said antibody (or antibody fragment) and said 
mutant galactokinasc protein. The detection for the presence or absence of said 

10 complex is within the skill of the art (e.g.. ELIS A, RIA. Western Blotting. Optical 
Biosensor (e.g., BlAcore - Pharmacia Biosensor. Uppsala. Sweden) and do not limit 
this invention. 

This invention also contemplates pharmaceutical compositions 
comprising an effective amount of the.galactokinase protein of the invention and a 
1 5 pharmaceurically acceptable carrielr. Pharmaceutical compositions of proteinaceous 
drugs of this invention are panicularly useful for parenteral administration, i.e.. 
subcutaneously. intramuscularly or intravenously. Optionally, the galactokinasc 
protein is surrounded by a membrane bound vesicle, such as a liposome. 

The compositions for parenteral administration will commonly comprise a 
20 solution of the compounds of the invention or a cocktail thereof dissolved in an 

acceptable carrier, preferably an aqueous carrier. A variety of aqueous carriers may 
be employed, e.g.. water, buffered water, 0.4% saline. 0.3% glycine, and the like. 
These solutions are sterUe and generally free of paniculate matter. These solutions 
may be sterilized by conventional, well known sterilization techniques. The 
25 compositions may contain pharmaceutically acceptable auxiliary substances as 

required to approximate physiological conditions such as pH adjusting and buffering 
agents, etc. The concentration of the compound of the invention in such 
pharmaceutical formulation can very widely, i.e.. from less than about 0.5%. usually 
at or at least about 1% to as much as 15 or 20% by weight and will be selected 
30 primarily based on fluid volumes, viscosities, etc., according to the particular mode of 

administration selected. 

Thus, a pharmaceutical composition of the invention for intramuscular 
injection could be prepared to contain 1 mL sterile buffered water, and 50 mg of a 
compound of the invention. Similarly, a pharmaceutical composition of tiie invention 

35 for intravenous infusion could be made up to contain 250 ml of sterile Ringer's 

solution, and 150 mg of a compound of the invention. Actual methods for preparing 
parenterally administrable compositions are well known or will be apparent to those 
skilled in the art and are described in more detail in. for example. Rcmingwn's 
P h^r^^rpnriral .Science . 15th ed.. Mack Publishing Company. Easton. Pennsylvania. 

40 The compounds described herein can be lyophilized for storage and 

reconstituted in a suitable carrier prior to use. This technique has been shown to be 

14 



10 



15 



20 



25 



30 



35 



40 



^^'^'O'^^^ PCr/US95/06743 

5 effective with conventional proteins and ait-known lyophilization and reconstitution 
techniques can be employed. 

The physician wUl determine the dosage of the present therapeutic agents 
which wUl be most suitable and it will vary with the form of administration and the 
particular compound chosen, and furthermore, it will vary with the particular patient 
under patient under treamient. He will generally wish to initiate treatment with small 
dosages substantially less than the optimum dose of the compound and increase the 
dosage by small increments until the optimum effect under the circumstances is 
reached. It wUl generally be found that when the composition is administered orally 
larger quantities of the active agent will be required to produce the same effect as a 
smaller quantity given parenterally. The therapeutic dosage wUl generally be ftx>m 1 
to 10 milligrams per day and higher although it may be administered in several 
different dosage units. 

Depending on the parient condition, the phaimaceutical composition of the 
uivention can be administered for prophylactic and/or therapeutic treatments. In 
therapeutic application, compositions are administered to a patient already suffering 
from a disease in an amount sufficient to cure or at least partially arrest the disease 
and Its compUcations. In prophylactic applications, compositions containing the 
present compounds or a cocktail thereof are administered to a parient not already in a 
disease state to enhance the patient's resistance. 

Single or multiple administrations of the pharmaceutical compositions can be 
earned out with dose levels and pattern being selected by the treating physician In 
any event, the pharmaceutical composition of the invention should provide a quantity 
of the compounds of the invention sufficient to effectively treat the patient. 

This invention also contemplates use of the galactokinase genes of the instant 
mvennon as a diagnostic. For example, some diseases result from inherited 
defective genes. These genes can be detected by comparing the sequence of the 
defecnve gene with that of a normal one. Subsequently, one can verify that a 
"mutant" gene is associated with galactokinase deficiency by measurement of 
galactose. That is. a mutant gene would be associated with (atypically) elevated 
levels of galactose in a patient. In addition, one can insert mutant galactokinase 
genes into a suitable vector for expression in a functional assay system (e.g 
colorimetric assay, expression on MacConkey plates, complementation experiments 
e.g. m a galactokinase deficient strain of yeast or £. coli) as yet another means to 
venfy or identify galactokinase mutations. As an example, RNA from an individual 
can be transcribed with reverse transcriptase to cDNA which can then be amplified 
by polymerase chain reaction (PGR), cloned into an E. coli expression vector, and 

15 
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5 transformed into a galactokinase-deficient strain of E. coli. When grown on 

MacConkey indicator plates, galactokinase-deficient cells will produce colonies that 
are white in color, whereas cells that have been transformed/complemented with a 
functional galactokinase gene will be red (see, e.g.. Examples section). If most to 
all of the colonies from an individual are red, then the individual is considered to be 

10 normal with respect to galactokinase activity. If approximately 50% of the colonies 
are red (the other 50% white), then that individual is likely to be a carrier for 
galactokinase deficiency. If most to all of the colonies are white, then that 
individual is likely to be galactokinase deficient Once "mutant" genes have been 
identified, one can then screen the population for carriers of the "mutant" 

15 galactokinase gene. (A carrier is a person in apparent health whose chromosomes 
contain a "mutant" galactokinase gene that may be transmitted to that person's 
offspring.) In addition, monoclonal antibodies that are specific for the mutant 
galactokinase proteins can be used for diagnostic purposes as described above. 
Individuals carrying mutations in the human galactokinase gene may be 

20 detected at the DNA level by a variety of techniques. Nucleic acids used for 

diagnosis (genomic DNA, mRNA, etc.) may be obtained from a patient's cells, such 
as from blood, urine, saliva, tissue biopsy (e.g., chorionic villi sampling or removal 
of amniotic fluid cells), and autopsy material. The genomic DNA may be used 
directly for detection or may be amplified enzymatically by using PCR, ligase chain 

25 reaction (LCR), strand displacement amplification (SDA), etc. (see, e.g.. Saiki et al., 
Naiurg> 224:163-166 (1986), Bej, et al., Crit. Rev. Biochem. Molec Rinl 26:301- 
334 (1991), Birkcnmeyer et al., J. ViroL Meth. . 25:1 17- 126 (1991), Van Brunt, J.. 
Bio/Technolog y. fi:29 1-294 (1990)) prior to analysis. RNA may also be used for 
the same purpose. The RNA can be reverse- transcribed and amplified at one rime 

30 with PCR-RT (polymerase chain reaction - reverse transcriptase) or reverse- 
transcribed to an unamplified cDNA. As an example, PCR primers complementary 
to the nucleic acid of the instant invention can be used to identify and analyze 
galactokinase mutations. For example, deletions and insenions can be detected by a 
change in size of the amplified product in comparison to the normal galactokinase 

35 genotype. Point mutations can be identified by hybridizing ampHfied DNA to 
radiolabeled galactokinase RNA (of the invention) or alternatively, radiolabelled 
galactokinase antisense DNA sequences (of the invention). PerfecUy matched 
sequences can be distinguished from mismatched duplexes by RNase A digestion or 
by differences in melting temperatures (Tm). Such a diagnostic would be particularly 

40 useful for prenatal and even neonatal testing. 
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^ In addition, point mutations and other sean^n,.^ w vr 

techniques, es direct DMA L ""^^ "dl-taown 

sequencing primer is used wid, double-swnded PTR l! "^P'"' ' 

10 «n,pia« molecule generared by a o.olfi^^'^.'^f " ' -^''---ed 
performed by convendonal p^edures ^H^^\ ^, '"'^"^■'°'> 

aj..o™dcse,ue„cin.p,.ed^re:t::rerr^^^^^ 
also be used as probes » de.ec, specific DNA segmems 

™*«.isg>eadyenhancedwhencombined„ib^Tr 
15 rep.au may comlaie .o a change in ialactoll ° 

serve as n«rker for various polymo^WsT " " 

Genetic testing based on DNA sequence diffem„„. 
detection of aJterad„„ electtophotetic loCo^rNT Z T 
Without denaturing aeents <;m,n fragments m gels with or 

0 -uaa^^.higb^r:!:,rxr^^^^^^^^^^^^ 

sequences may be disrin^ui,h.H ^ ^^A fragments of different 

.ob.desof d^ffer::7rr:rr - - 

sn-ai, deiedons. may be dlt«t d as cH^ sT^ ZT::''"""'"'' " 
hetenxiuplcxes in non-denaturing gel eTec^H T """" "'""^ 

elec^^phoresis, (see. e.g., Nagami^e rC^I^O ""^r'" 
(1989)). flm. J , Hirm Of nn , ii:337-339 

Sequence changes at specific locations may also be reve,l,H k , 
protection assays, such as RNase ,nH <! i ='so be revealed by nuclease 

(e.g„ Cotton et a . a^s^l^" S uT"" " 
-hash,,,,^,„„,.^ 

.242 (.98.)) ch^aTcCt " " • 

^5:4397-4401 (1985))). direct DnI s^lc^or .^^f^r^'^^^ 
(e.g.. testricion ftagment length polymotph T^Z I'T''"" 
number and size of restriction tegm;„ts 

i-uuunuciease restncoon sequence) ^snnrh^^r. ui 

used to idendfy large (i e Ut" ha„ 1? * •« 

ge (..e.. greater than 100 base pair) deletions and insertions. 
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5 In addition 10 more conventional gel-electrophoresis, and DNA sequencing, 

mutations (e.g., microdeletions, aneuploidies, translocations, inversions) can also be 
detected by in situ analysis (See. e.g., Keller et ah. DNA Probes, 2nd Ed., Stockton 
Press. New York, N,Y., USA (1993)). That is, DNA (or RNA) sequences in cells 
can be analyzed for mutations without isolation and/or immobilization onto a 

10 membrane. Fluorescence in situ hybridization (FISH) is presently the most 

commonly applied method and numerous reviews of FISH have appeared. See, e.g., 
Trachuck et al., SfiiCDCfi. 25Q:559-562 (1990), and Trask ct al., Trends, GcngI-> l- 
149-154 (1991) which are incorporated herein by reference for background 
purposes. Hence, by using nucleic acids based on the structure of specific genes, 

15 e.g., gaiactokinase, one can devel6p diagnostic tests for galactokinase deficiency. 

In addition, some diseases are a result of, or are characterized by, changes in 
gene expression which can be detected by changes in the mRNA, Alternatively, the 
galactokinase gene can be used as a reference to identify individuals expressing a 
decreased level of galactokinase, e.g., by Northern blotting or in situ hybridization. 

20 Defining appropriate hybridization conditions is within the skill of the an. 

See , e.g.. "Current Protocols in Mol. Biol." Vol. I & II, Wiley Interscience. Ausbel 
£1^. (ed.) (1992). Probing technology is well known in the an and it is appreciated 
that the size of the probes can vary widely but it is preferred that the probe be at least 
15 nucleotides in length. It is also appreciated that such probes can be and are 

25 preferably labeled with an analytically detectable reagent to facilitate identification of 
the probe. Useful reagents include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a detectable product. As a 
general rule the more stringent the hybridization conditions the more closely related 
genes will be that are recovered, 

30 Also within the scope of this invention are antisense oligonucleotides 

predicated upon the sequences disclosed herein for human galactokinase. Synthetic 
oligonucleotides or related antisense chemical structural analogs are designed to 
recognize and specifically bind to a target nucleic acid encoding galactokinase and 
galactokinase mutations. The general field of antisense technology is illustrated by 

35 the following disclosures which are incorporated herein by reference for purposes of 
background (Cohen, J.S., Trn^^ ^" Phami. Sci.. 10:435(1989) and Weintraub, H.M. 
Sriftnrific American. Jan.(1990) at page 40). 

Transgenic, non-human, animals may be obtained by transfecting appropriate 
fertilized eggs or embryos of a host with nucleic acids encoding human galactokinase 

40 disclosed herein, sec for example U.S. Patents 4,736,866; 5,175,385; 5,175,384 and 
5,175,386. The resultant transgenic animal may be used as a model for the study of 
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Eurifitaripn of Hiimnn Oihnniiin^^. f,„.. pi^rrnnl Ti-iin 

^rt>ed o„,o DEAE-Sephaccl®. 7.e .aieria, was elu.ed. precipi..ed^r 
—.•n s„^a,e and U,en run ■hre.ugh a sizing co,un«, (Sephadex G- JsF®, 

s3r, ^ P"'"-^'™-"' elecDophoresis and ,hen Wes.era blo„ed using 

iBo.£:154 (1988),. Minu.e a„.„n,s ofgalacln. " ^»i^ " • 
O™™^) fron, n,u,dp,e rounds of protein puriflcadon. After a Z^^m. 

■30 fragments are presented below: 'ongest 
[SEQIDNO:!] 

"^'^ X't" ; 'ir'r ""'^ ^^'^ Val Leu- 

Pro Met Ala Leu Glu Leu Met TT^r Val Leu Val Gly Ser Pro ^g 

35 [SEQIDN0:2J 

" A?J A? '1"'%?' ""'^ '''^ "^^^ 'r>- Leu Ser Gin- 
Ala Ala Asp Gly Ala Lys 



[SEQ ID NO:3] . 

Ala Gin VaJ Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys- 
Gly De Met Asp Gin Phe He Ser Leu Met Gly Gin Lys 
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5 The fragments were compared with peptide sequences encoded by cDNAs, in 

which the cDNAs were partially sequenced. The cDN As (also known as expressed 
sequence tags or ESTs) were obtained from Human Genome Sciences, Inc. 
(Rockville, MD, USA). The best alignments occuired with an EST sequence from a 
human osteoclastoma stromal cell library (SEQ ID NO:l showed 100% identity over 

10 18 contiguous amino acids) and an EST sequence from a human pituitary library (SEQ 
ID N0:2 showed 95.5% identity over 22 contiguous amino acids). A full-length 
cDNA from the human osteoclastoma stromal cell library was identified and 
sequenced (SEQ ID NO:4) in its entirety on an automated ABI 373A Sequencer. 
Sequencing was confirmed on both strands. The corresponding amino acid sequence 

15 (SEQ ID NO:4) was compared agaanst the peptide fragments identified above. SEQ 
ID N0:1 corresponds to amino acids 38-68 of the full-length human galactokinase 
protein. Similarly, SEQ ID NOs: 2 and 3 correspond to amino acids 367-388 and 
167-195, respectively, of human galactokinase. 

20 Analvf;is of t hft Human Galacmkinase Gene: 

A comparison of the amino acid sequence for human galactokinase with that of 
E. coli galactokinase (Debouck et al.. Nnc. Acid Res.. 12:1841-1853 (1985)) shows 
61% similarity and 44.5% identity. Further comparison with another purponed 
human galactokinase gene {GK2) (Lee et al., Proc. Natl. Acad. Sci. USA. fiS: 10887- 
25 10891 (1992)) shows 54% similarity and 34.6% identity at the amino acid level. 

Furthermore, the GK2 gene maps to human chromosome 15 which is in contrast to 
the gene of the present invention which maps to human chromosome 17, position q24 
as detemiined by fluorescence in situ hybridization (FISH) analysis. 

SEQ ID NO:4 was hybridized against a Northern blot containing human 
30 messenger RNA from placenta, brain, skeletal muscle, kidney, intestine, heart, lung 
and liver according to standard procedures (see, e.g.. Sambrook et al., Molggular 
Cloning: A I ^horatorv Manual . 2nd Ed., Cold Spring Harbor Laboratory Press. 
1989). Hybridization was strongest with human liver and lung tissue. 

35 Galactokina ^ift romplementarion: 

SEQ ID NO:4 was subcloned into an £. coli vector, plasmid pBluescript 
[Stratagene]. When transformed into C600K-, a galactokinase-deficient strain, the 
transformed £. coli grew on MacConkey agar plates containing 1% galactose (and 
ampicillin @ 50ug/ml for plasmid selection), and produced brick red colonies, 

40 indicating sugar fermentation. Specifically, the red color is due to the action of acids. 
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5 produced by galactose fenncntation, upon bile salts and the indicator (neutral red) in 
MacConkey medium. 

Expression in Mammalian CelU: 

SEQ ID N0:4 was also subcioned into COS-1 cells [ATCC CRL 1650]. The 
10 cells were transfected, grown, and cell lysates were prepared. The lysates were 

assayed by a "C galactokinase assay as described by Stambolian et al. fExp. Fvf. Rfi ,^, 
25:231-237 (1984)) which is hereby incorporated by reference in its entirety. When 
expressed in transiently transfected COS cells, galactokinase activity was tenfold 
higher than control levels (6600 vs. 640 counts per minute - repeated three times). 
15 These results definitively confirm that SEQ ID NO:4 encodes a full-length, 
biologically active, human galactokinase gene. 

The nucleic acid molecule of the invention can also be subcioned into an 
expression vector to produce high levels of human galactokinase (either fused to 
another protein, e.g., operatively linked at the 5' end with another coding sequence, or 
20 unfused) in nransfected cells. For mammalian cells, the expression vector would 

optionally encode a neomycin resistance gene to select for transfcctants on the basis of 
ability to grow in G4 18 and a dihydrofolate reductase gene which permits 
amplification of the transfected gene in DHFR- cells. The plasmid can then be 
introduced into host cell lines e.g., CHO ACC98, a nonadherent. DHFR" ceU line 
25 adapted to grow in serum free medium, and human embryonic kidney 293 cells 
(ATCC CRL 1573), and transfected cell lines can be selected by G418 resistance. 

Human Galactokinase Genf! - Genomir; .Segnpnrp; 

A full-length galactokinase genomic gene coding region was identified fhsm a 
30 lambda phage (k Fix II) human genomic library (made from human placenta tissue) 
using the galK cDNA as a probe. One isolate, designated clone 17 was deposited on 

3 May 1995, with the American Type Culture Collection (ATCC), RockviUe, MD. 
USA, under accession number ATCC 97135, and has been accepted as a patent 
deposit, in accordance with the Budapest Treaty of 1977 governing the deposit of 

35 microorganisms for the purposes of patent procedure. 

The genomic gene coding region is divided into at least 8 exons isolated from 

4 DNA fragments. The arrangement is depicted in Figure 1. The DNA sequence was 
determined by using multiple oligonucleotide PCR primers corresponding to the galK 
cDNA sequence (i.e., corresponding to galK genomic exons) as well as 

40 oligonucleotide PCR primers subsequently designed that correspond to non-coding 
regions (i.e., galK genomic introns). Thus the structure of the galactokinase genomic 
gene is sunmiarized in Table 1 below (see also Figure 2 and SEQ ID NO:7]): 
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Table 1 

Genomic Galactokinase Gene 



Exon # 


Amino Acids 
Encoded 


PGR Primer #/ 
rSEO ID NOl 


1 


1-55 


3333/[8] 






3334/[9] 






3598/[10] 






3599/[ll] 




</C 110 


1888/[12] 






3332/[13] 






3604/[14] 


■ 




3605/fl5] 


3 


119-158 


3331/[16] 






3606/[17] 


4 


159-204 


1657/[18] 






3034/[19] 


5 


205-264 


3330/[20] 






3607/[21] 


6 


265-315 


1539/[22] 






2665/[23] 


7 


316-369 


1891/[24J 






2665/[25] 


8 


370-392 


2665/[26] 






2666/[27] 






2667/[28] 



10 



Galactnlcina<;ft n^ficiencv Marker/nenp- 

A fibroblast cell line (GM00334), derived from a patient with galactokinase 
15 deficiency, was obtained from the Coriell Institute for Medical research, 401 Haddon 
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DNA n T ' °" "^^ ® Tx). Cytoplasmic 

DNA (I ug) was reversed inscribed wifl, oligonucleotide primers 1823 fSEO ID 

10 !r I ' ™' ''NA product was purified 

Twelve cDNAs m total were sequenced (representing cloned PCR pr«i„cts of 
fl^bl^ts ftom nonnal controls (i.e.. persons not e^hibidng g^actoWnase deficiency^ 

for G a, posmon 122 of the • human galactoldnase gene (SEQ ID NO- 4] 

■Z r r A" 3' M« (SEQ ID NO- 5) 

TCrl^^l ^ ^ ' ="<Jonuclease restriction site (i.e.. 

TGG4«:a) on the mutant allele. 11,15 restriction site was then used ,0 rapidly screen 
or the mutant allele in the parents of the patient with galactoldnase deflciencT ^ 

T ''"^^ ^NA sequence was 

de tenmned. mcludtng a portion of the flanking intton sequences. OUgonudeodde 
pnmers (X2 50UT tSEQ ID NO: 31J and X2.30UT fSEQ ID NO: 32") were 

25 '° ^'"^'^ ^P'"--- Of a 346 bp DNA 

25 ftagment of the genomic DNA. The PCR product was analyzed for the p^im 

mutatton Via RI=1^. „a. is, the presence of a newly created Mscl site tZted by 
. ectrophorests of a 1.5% agarose ge,. A "nom,al" allele remains uncut ' 
enzyme Mscl. and thus mierates as a ^4fihn fr.r^ wim tne 

as a 346bp fragment on an agarose eel The PCR 

a i^e : ^bo r 

u contrast. PCR products from the parents of this patient, followed by a Mscl 
d^esnon. resulted in three fragments (346. 193 and 153 bp) which is consistem with a 
hetetozygous pattern for the G to A base change. That is. the parents were 
35 earners ofdie same mutation. "cootli 

acdvitv^cD^^r ""^'^ '"^y^-ic 

acnvtty a cDNA clone containing the G ,0 A base change was subcloned into COS 

cells and assayed for galactokinase activity as previously described. COS ^ Is 
transfected wiU, cDNA enc«iing the missense mutadon had the same level of 

COS cells transfected w.U, the non-mutant galacokinase cDNA fSEQ ID NO-4) had a 
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5 fifty-fold higher activity compared to the host COS cells (i.e., control). This results 
supports the Val^^ to Met^^ substitution as the cause of the decreased enzymatic 
activity. 

Another mutation was discovered in an unrelated patient having cataracts and 
diagnosed as galactokinase deficient (galactokinase activity was found to be close to 
10 zero). Genomic DNA was isolated from lymphoblastoid cell lines and sequenced by 
automated sequencing on an ABI 373 A sequencer. A single base substitution of T for 
G resulted in an in-frame nonsense codon (i.e., TAG) at amino acid position 80 [SEQ 
ID NO:6]. This mutation causes premature termination of human galactokinase, 
resulting in a truncated protein of 79 amino acids that would be expected to be non- 
15 functional. (The genomic DNA of the parents of this patient were heterozygous for 
this mutation, and hence not galactokinase deficient.) 



The above description and examples fully disclose the invention including 
20 preferred embodiments thereof. Those skilled in the an will recognize, or be able to 
ascertain using no more than routine experimentation, many equivalents to the specific 
embodiments herein. Such equivalents are intended to be within the scope of the 
following claims. 
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SEQUENCE LISTING 



10 



20 



25 



30 



40 



(1) GENERAL INFORMATION: 

fl) APPLICANT: Bergsma, DerJc J. 

Stambolian, Dwlght 



(11) TITLE OF INVENTION: Human Galactoklnase Gene 
fiiiJ NUMBER OF SEQUENCES: 32 

(Iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SmithKllne Beechan, Corp. /Corporate 
Intellectual Property 

(B) STREET: 709 Swedeland Roaci/UW2220 

(C) CITY: King of Prussia 

(D) STATE: Pennsylvania 

(E) COUNTRY: USA 

(F) ZIP: 19406-0939 



(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, version #1.30 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

' (C) CLASSIFICATION: 

(vii) PRIOR APP^LICATION DATA: 

(A) APPLICATION NUMBER: PCT/US94/10825 

(B) FILING DATE: 23-SEP-1994 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Sutton, Jeffrey A. 

(B) REGISTRATION NUMBER: 34,028 

(C) REFERENCE/DOCKET NUMBER: P50268-1 
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20 



(ix) TELECOMMUNICATION INFORMATION 
(A) TELEPHONE: 610-270-5024 
<B) TELEFAX: 610-270-5090 



<2) INFORMATION FOR SEQ ID N0:1: 



{i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 
15 <B) TYPE: amino acid 

CO STRANDEDNESS : ^single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Val Asn Leu He Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu 
1 5 10 15 



Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg 
30 20 25 30 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
{ C ) STRANDEDNES S : single 
(D) TOPOLOGY: linear 

40 <ii) MOLECULE TYPE: protein 
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10 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

His lie Gin Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gin 



^ 10 15 



Ala Ala Asp Gly Ala Lys 
20 

(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Ala Gin val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys 
30 ^ '° 15 

Gly lie Met Asp Gin Phe lie Ser Leu Met Gly Gin Lys 



35 



40 



20 25 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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5 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 29.. 1204 



10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 
52 

Met Ala Ala Leu Arg Gin Pro Gin 
15 . ^ ^ 

GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 
20 10 15 20 

GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu 
25 25 30 35 40 

ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 

30 45 50 55 

CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 
35 60 65 70 

GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG 
292 

Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gin Arg Leu 
40 75 80 85 

CAG TTT CCA CTG CCC AC A GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 
340 

Gin Phe Pro Leu Pro Thr Ala Gin Arg Ser Leu Glu Pro Gly Thr Pro 
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100 



5 90 95 



15 



CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAG TAG CCA GCT GCC 
388 

Arg Trp Ala Asn Tyr Val Lys Gly Val lie Gin Tyr Tyr Pro Ala Ala 

10 105 110 115 ^20 

CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 
436 

Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly 

125 130 

GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 
484 

Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe 
140 145 150 

CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG 
532 

Leu Gin Gin Leu Cys Pro Asp Ser Gly Thr lie Ala Ala Arg Ala Gin 
25 155 160 165 

GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC 
580 

Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly lie 
30 170 175 180 

ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC 
628 



20 



Met Asp Gin Phe He Ser Leu Met Gly Gin Lys Gly His Ala Leu Leu 

200 



35 185 190 



. ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 
676 

He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 

205 210 215 

AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 
724 

Lys Leu Ala Val Leu He Thr Asn Ser Asn Val Arg His Ser Leu Ala 
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5 220 225 230 

TCC AGC GAG TAG OCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 
772 

Ser Ser Glu Tyr Pro Val Arg Arg Arg Gin Cys Glu Giu Val Ala Arg 
10 235 240 245 

GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG 
820 

Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gin Leu Glu Glu Leu Glu 
15 250 255 260 

GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC 
868 

Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His 

20 265 270 275 280 

GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA 
916 

Val Val Gly Glu lie Arg Arg Thr Ala Gin Ala Ala Ala Ala Leu Arg 

25 285 290 295 

CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC 
964 

Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg 

30 300 305 310 

TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG 
1012 

- Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gin Leu 

35 315 320 325 

GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG 
1060 

Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr 
40 330 335 340 

GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT 
1108 

Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala 
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5 345 350 355 360 

GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAG GGC GGG ACT GCC 
1156 

Ala Pro His Ala Met Arg His He Gin Glu His Tyr Gly Gly Thr Ala 
10 365 370 375 

ACC TTC TAG CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG 
1204 

Thr Phe Tyr Leu Ser Gin Ala Ala Asp Gly Ala Lys Val Leu Cys Leu 

15 380 385 390 

TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 
1264 

20 TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 
1324 



25 



30 



AAAAAAAAAA AAAAAAAAAC TCGAG 
1349 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
35 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: CDNA 



40 (ix) FEATURE: 

(A) NAME/KEY: CDS 

(B> LOCATION: 29.. 1204 
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5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 
52 



10 



Met Ala Ala Leu Arg Gin Pro Gin 

1 5 

GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 
15 10 15 20 

GGG GCC GAG CCC GAG CTG GCC ATG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Met Ser Ala Pro Gly Arg Val Asn Leu 
20 25 30 35 40 

ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

He Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 
25 45 50 55 

CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 
30 60 65 7 0 

GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG 
292 

Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gin Arg Leu 
35 75 80 85 

CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 
340 

Gin Phe Pro Leu Pro Thr Ala Gin Arg Ser Leu Glu Pro Gly Thr Pro 
40 90 95 100 

CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC 
388 , 

Arg Trp Ala Asn Tyr Val Lys Gly Val He Gin Tyr Tyr Pro Ala Ala 
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5 105 110 



10 



15 



35 



120 



CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 
436 

Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly 

125 130 

GOT GGC CTG.TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 
484 

Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe 
140 145 150 



CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC 



532 



CAG 



Leu Gin Gin Leu Cys Pro Asp Ser Gly Thr He Ala Ala Arg Ala Gin 
20 "5 160 165 



GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC 



580 



ATC 



Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly He 
25 170 175 180 



ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG 



628 



CTC 



Met Asp Gin Phe He Ser Leu Met Gly Gin Lys Gly His Ala Leu Leu 

ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 
67 6 

He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 

205 210 215 

AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 
724 

Lys Leu Ala Val Leu He Thr Asn Ser Asn Val Arg His Ser Leu Ala 
220 225 230 

TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 
772 



40 



Ser Ser 'Glu Tyr Pro Val Arg Arg Arg Gin Cys Glu Glu 



Val Ala Arg 
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235 

GCG CTG GGC AAG 
820 

Ala Leu Gly Lys 
250 

GCT GCC AGG GAG 
868 

Ala Ala Arg Asp 
265 

GTG GTG GGG GAG 
916 

Val Val Gly Glu 



CGT GGC GAC TAG 
964 

Arg Gly Asp Tyr 
300 

TCA CTC AGA GAC 
1012 

Ser Leu Arg Asp 
315 

GTG GAG GCT GCG 
1060 

Val Glu Ala Ala 
330 

GGC GGT GGC TTC 
1108 

Gly Gly Gly Phe 
345 

GCT CCC CAC GCC 
1156^ 

Ala Pro His Ala 



240 

GAA AGC CTC CGG 

Glu Ser Leu Arg 
255 

CTG GTG AGC AAA 

Leu Val Ser Lys 
270 

ATT CGG CGC ACG 

lie Arg Arg Thr 
285 

AGA GCC TTT GGC 
Arg Ala Phe Gly 

GAC TAT GAG GTG 

Asp Tyr Glu Val 

320 

CTT GCT GTG CCT 

Leu Ala Val Pro 
335 

GGT GGC TGC ACG 

Gly Gly Cys Thr 
350 

ATG CGG CAC ATC 
Met Arg His lie 



GAG GTA CAA CTG 

Glu Val Gin Leu 
260 

GAG GGC TTC CGG 

Glu Gly Phe Arg 
275 

GCC CAG GCA GCG 

Ala Gin Ala Ala 
290 

CGC CTC ATG GTG 

Arg Leu Met Val 
305 

AGC TGC CCA GAG 

Ser Cys Pro Glu 

GGG GTT TAT GGC 

Gly Val Tyr Gly 
340 

GTG ACA CTG CTG 

Val Thr Leu Leu 
355 

CAG GAG CAC TAC 
Gin Glu His Tyr 
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245 

GAA GAG CTA GAG 
Glu Glu Leu Glu 

CGG GCC CGG CAC 

Arg Ala Arg His 
280 

GCC GCC CTG AGA 

Ala Ala Leu Arg 

295 

GAG AGC CAC CGC 

Glu Ser His Arg 
310 

CTG GAC CAG CTG 

Leu Asp Gin Leu 

325 

AGC CGC ATG ACG 

Ser Arg Met Thr 

GAG GCC TCC GCT 

Glu Ala Ser Ala 
360 

GGC GGG ACT GCC 
Gly Gly Thr Ala 
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5 365 370 375 

ACC TTC TAG CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG 
1204 

Thr Phe Tyr Leu Ser Gin Ala Ala Asp Gly Ala Lys Val Leu Cys Leu 
10 380 385 390 

TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 
1264 

15 TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 
1324 



20 



30 



35 



AAAAAAAAAA AAAAAAAAAC TCGAG 
1349 

(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 9 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 29.. 265 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 
40 52 

Met Ala Ala Leu Arg Gin Pro Gin 
1 5 
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5 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 
10 15 20 

10 GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu 

25 30 35 40 

15 ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 

45 50 55 

20 CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 
60 65 70 

25 GTG TCT CTC CTC ACC ACC TCT TAGGGTGCCG ATGAGCCCCA GCGGCTGCAG 
295 

Val Ser Leu Leu Thr Thr Ser 
75 

30 TTTCCACTGC CCACAGCCCA GCGCTCGCTG GAGCCTGGGA CTCCTCGGTG GGCCAACTAT 
355 



35 



GTCAAGGGAG TGATTCAGTA CTACCCAGCT GCCCCCCTCC CTGGCTTCAG TGCAGTGGTG 
415 

GTCAGCTCAG TGCCCCTGGG GGGTGGCCTG TCCAGCTCAG CATCCTTGGA AGTGGCCACG 
475 



TACACCTTCC TCCAGCAGCT CTGTCCAGAC TCGGGCACAA TAGCTGCCCG CGCCCAGGTG 
40 535 

TGTCAGCAGG CCGAGCACAG CTTCGCAGGG ATGCCCTGTG GCATCATGGA CCAGTTCATC 
595 
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5 TCACTTATGG GACAGAAAGG CCACGCGCTG CTCATTGACT GCAGGTCCTT GGAGACCAGC 
CTGGTGCCAC TCTCGGACCC CAAGCTGGCC GTGCTCATCA CCAACTCTAA TGTCCGCCAC 



10 



15 835 



TCCCTGGCCT CCAGCGAGTA CCCTGTGCGG CGGCGCCAAT GTGAAGAAGT GGCCCGGGCG 
CTGGGCAAGG AAAGCCTCCG GGAGGTACAA CTGGAAGAGC TAGAGGCTGC CAGGGACCTG 



GTGAGCAAAG AGGGCTTCCG GCGGGCCCGG CACGTGGTGG GGGAGATTCG GCGCACGGCC 
8 95 

20 CAGGCAGCGG CCGCCCTGAG ACGTGGCGAC TACAGAGCCT TTGGCCGCCT CATGGTGGAG 



25 



AGCCACCGCT CACTCAGAGA CGACTATGAG GTGAGCTGCC CAGAGCTGGA CCAGCTGGTG 
GAGGCTGCGC TTGCTGTGCC TGGGGTTTAT GGCAGCCGCA TGACGGGCGG TGGCTTCGGT 



30 ^33^^^"^"^ TGACACTGCT GGAGGCCTCC GCTGCTCCCC ACGCCATGCG GCACATCCAG 
GAGCACTACG GCGGGACTGC CACCTTCTAC CTCTCTCAAG CAGCCGATGG AGCCAAGGTG 

» ^ 7 ^ 

35 CTGTGCTTGT GAGGCACCCC CAGGACAGCA CACGGTGAGG GTGCGGGGCC TGCAGGCCAG 
^ o o 



40 



TCCCACGGCT CTGTGCCCGG TGCCATCTTC CATATCCGGG TGCTCAATAA ACTTGTGCCT 
1 3 5 

CCAATGTGGA AAAAAAAAAA AAAAAAAACT CGAG 
1349 
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5 (2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7676 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 ; 



20 CCGAGCATCC CGCGCCGACG GGTCTGTGCC GGAGCAGCTG TGCAGAGCTG CAGGCGCGCG 
60 



25 



TCATGGCTGC TTTGAGACAG CCCCAGGTCG CGGAGCTGCT GGCCGAGGCC CGGCGAGCCT 
120 

TCCGGGAGGA GTTCGGGGCC GAGCCCGAGC TGGCCGTGTC AGCGCCGGGC CGCGTCAACC 
180 



TCATCGGGGA ACACACGGAC TACAACCAGG GCCTGGTGCT GCCTATGGTG AGGGGCTGCA 
30 240 

CGGGGAGCCC CTAGCCCGCC GCCGCCTGTC CCGGTCGCCG AGGAGGGCGG GCCTCGGGGA 
300 

35 CGCTGGGGGC GAGTTCTTCC CGCGGGAGAT GTGGGGCGGG CAGCTGCGCC TGGAGCACCG 
360 

GTGCACGGAA GAGTCCCCGG GACAGGCTGT TCCCCACGTT GGAAGGGAGG AAGCGAAGAA 
420 

40 

GTGGTCCCCA GAGGGTGCGC GGCCGCCTCT TGGCTCAAGC CCGCCCTCTG GGGGCTGGGG 
480 
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5 



CTCCTCGCCT TCAACCTGGG AGCATGTTCC CCTTAAACTG TGAGGCCCTG TGTGCCACGC 
540 



AGAAGGGGAC ACTCCGCGCC TCCGGCCACC 
600 

10 

GTAGCCTTCT GGCCCAGCCC GTTCAATTTA 
660 

CAGTGAACTG CTGGAGGTCA CACAGCAGGT 
15 720 

ACTCCCAGCT TTCAGCGGGG GGCGCTTTCC 
780 

20 TACAGGATGT CCAGAGCCAC CCAAAATGTA 
840 



GTGGGGCCCC AACCGCAGAC CTGGGCGAAC 
CAGAGGAGGA AACTGAGGCC TAGAGAGGCC 
TCTTGGCGGG GCTGCGACTT GGGAGTGAGG 
GCCCCATCTG CAGCTTGGGG AGTGCACAGG 
AAGGCTTTGG AGCTCCAGTG ATCTGTTTTC 



CCTTTGGGCT AAGCTCTCCC CCCTTGCCCC 
900 

25 

CCAGCTGCAG CCGCCCCGCC CCTGAAGACC 
960 

AGCTGGCCCT CAGGATCTTC CCTGCGACGC 

30 1020 



ACAGCTCAGG GCAGAGTCCA GGTCTGTGCT 
TAAGGGGGCA GGGCTCAAGC CCCCAAGGTC 
TGAACCTGGA GGTTCAGAAC CTGATGACTG 



TGGAGGCATC AGAACCTCGG CTGGAGGCAG 
1080 

GAAGCCTCAC GTACTGCTTG TCTCTCCTGC 
1140 



TGTCATTGGA GAGGCTTACT CCAGCTGGCG 
CAGGCTCTGG AGCTCATGAC GGTGCTGGTG 



GGCAGCCCCC GCAAGGATGG GCTGGTGTCT 
1200 

CCCCAGCGGC TGCAGTTTCC ACTGCCCACA 
1260 



CTCCTCACCA CCTCTGAGGG TGCCGATGAG 
GCCCAGCGCT CGCTGGAGCC TGGGACTCCT 
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5 CGGTGGGCCA ACTATGTCAA GGGAGTGATT CAGTACTACC CAGGTATGGG GCCCAGGCCT 
1320 

GAGCCAAGTC CTCACTGATA CTAGGAGTGC CACCTCACAG CCACAGAGCC CATTCATTTG 
1380 

10 

TCTGATACAC TGTGGGGAAG GCTTGTAGAG TGGAGCATCC CATTGTACAG ATGAGGAAAC 
1440 

TGATGCCCCC AGAAGGTCGG GAACTTGCCC TGGGTTTCCC GTGACCTGAT TGGAGGAGCC 
15 1500 

AGGATTTGAA CCCCAGCCTT TTTTCCCTCC AGAGCCCTAA ACCAGGAGGA CAATTAGAAG 
1560 

20 TGTCCCAGCA ACCTCAGAGG GTGGGAAAAT GGAGGGGAGT GGGTCCCTTG GGCCAGCAGG 
1620 

TTGGTGGGGT TCTTGACAAT TGAGACACAC ACCTAGAAAC AGTTGCTAGG CCGTTGCTGC 
1680 

25 

CCTTCCCGCC AGGACACCTG CCCTTCCTGT CCAATCCTCC CAGGCAGCCT CTCTTACCAT 
1740 

CACCTGTTCT TTCCCCCTGC AGCTGCCCCC CTCCCTGGCT TCAGTGCAGT GGTGGTCAGC 

30 1800 

TCAGTGCCCC TGGGGGGTGG CCTGTCCAGC TCAGCATCCT TGGAAGTGGC CACGTACACC 
1860 

35 TTCCTCCAGC AGCTCTGTCC AGGTACCAGC TAGGCCCCAG CCCTGACCCA GCCCTCCTTC 
1920 

CCTGAGGTCT CCAGGTGGTC CCAGCTTCTA CTATGCCTTA TGGAGGGGGT GGCAGGGAAT 
1980 

40 

CTCCCTGGAG TGTCATTGAA GCCACTGCTG CTTCCACCAG CCCTAGCCTC CCCACCTCAC 
2040 



40 



wo 96/09374 

5 CCTGTACTGC AGACTCGGGC ACAATAGCTG 
2100 

ACAGCTTCGC AGGGATGCCC TGTGGCATCA 
2160 

10 

AAGGCCACGC GCTGCTCATT GACTGCAGGT 
2220 

TGCACTCAGC AGCTCCTGGG TGGAGTGTGC 
15 2280 

GCCTCGTCAT CTCCCCCATT GTAACTCCAC 
2340 

20 CTCTCGGACC CCAAGCTGGC CGTGCTCATC 
2400 

TCCAGCGAGT ACCCTGTGCG GCGGCGCCAA 
2460 

25 

GAAAGCCTCC GGGAGGTACA ACTGGAAGAG 
2520 

TCCTGGAGGC GGCTGTGCTC CCTGCTGGCG 
30 2580 

CCCGATCTCC AGGGGCTTCT GCCATGCTCT 
2640 

35 GGTTCCAATC TCAGCAGGGG TGCTTGAAAT 
2700 

CATGTTTCCA TTGTGGAAAA TGTAGAAAAG 
2760 

40 

CACTACCCAG AGATAGGCAC TGCTGACATT 
2820 



PCT/US95/06743 

CCCGCGCCCA GGTGTGTCAG CAGGCCGAGC 

TGGACCAGTT CATCTCACTT ATGGGACAGA 
TGGGCTCGCT CCCCTCGTCC CCTCCCGCCC 
CCACTGCCTG GCGCAGCAAG CACACGCTTG 
CCCAGGTCCT TGGAGACCAG CCTGGTGCCA 
ACCAACTCTA ATGTCCGCCA CTCCCTGGCC 
TGTGAAGAAG TGGCCCGGGC GCTGGGCAAG 
CTAGAGGGTG AGAACTGCCA GGGTGCTCTA 
CCTCAGTGTG GCCTTGACCC TGCCTGGGAC 
CCCCAGTCCC TTCAAACACT GCGCACCCAG 
CCTAAAATGG TCTTATCTAA TCAGAAAAAT 
TACAAAGTAG AAAATAATAA GCTATAAGGG 
TTCACGTTTC CTTTCAGTAT TTTTCCACAT 
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5 CTGTCTTCAA AGCTGAGTAT ATGTAATATA 
2B80 

TTAAGAGGCA GGGTCTCATT CTGTTGCCCA 
2940 

10 

ACTGCAAACT TGAACTCTTG AGCTCAAGGG 
3000 

AGATTACAGG TGTGCCACCA TGCCCGGCTA 
15 3060 

TGTTGCCCAG GATGATCCTG AACTCTGGCC 
3120 

20 AGTGTTGGGA TTATAGGCAT GAGCCACTGC 
3180 

GACACAGAGT TTCGGTCTTG TCACCCATGC 
3240 

25 

GTAACCTCTG CCTCCCGGGT TCAAGTGATT 
3300 

CTACAGGCGC CCGCCACTAC GCCTGGCACA 
30 3360 

TCACCATGTT GGCCAGGCTG GTCTCAAACG 
3420 

35 CTTCCAAAGT GCTGGGATTA CAGGCGTGAG 
3480 

TTAAACTAAA CATAATCTCA GAACCCAGAA 
3540 

40 

ATCTCGGCGT GGCTCTTTTT XTTTTTTTTT 
3600 



PCTAJS95/06743 
TCATCACTTT CCCCCCCCAC CCCCTTTTTT 

AGCTGGAGTG TAGTGGTGTG ATCATAGCTT 

ATCCTCCCAG CTCAGCCTTC CAAGTAGCTG 

ATTTTTATCT TCGTAAAGAC GGCCTTGTAG 

TCAAGAGGTC CTCCTGCCTT GGGCTCCCAA 

GGCCAGCCCA TTTGCCGTGT TTTTTTTTTG 

TGGAGTGCAA TGGTGCGATC TCAGCTCACT 

CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 

TTTTTTATAG TTCTAGTAGA GACTGGGGTT 

CCTGACCTCA GGTGATCCTC CCGCCTCAGC 

CCATAGTGCC GGTCTCTTTT txxTTTTTTT 

CCCTATCTTA TCTTATGCCA TGAAAGGCAT 

CTTTTTTTTT GGGCGAGGTG GAGGCTTGCC 
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5 CTGTTCCCCA GGCTGGAGTG CAGCGGCGCA ATCTCGGTTC ACTGCATCCT CCACCTCCTG 



GGTCCAAATG ATCCTCCTGC CTTAGCTTCC TGAGTAGGTG GGATTACTGG 

10 



3720 AACCCACCAC 



CACGCCCAGC CAATTTTTAT ATTTTTAGTA GAGACGGGGT TTCATGTTGG CCAGGCTGGC 



3780 



GGATTACATG 



CTCGAACTCC TGACCTCGTG ATCTGCCCGC CTCAGCCTCC CAATGTGCTA 
10 3840 

TGTGAGCCAC TGCACCTGGC CTCCGTGTGG CTCTTTAAAG CTCCACAATA TTTTAGCATT 
20 CAGGTGCTCT GTCATTTACT TAACTATTTX CTGATACACC TCACACTGCG ATTAACTTTC 

CTTATTTATC TTTTTTATTA TTTATTTATT TATTTATTTG ^rnr^r-.r^n^r. 

4020 ■^■'"■i-i lATTTATTTG AGACAGAGTC TTGCTCTGTC 

25 

ACCCAGGCXG GAGTGCAGTG GCACGATCTC CGCTCACTGC AACCTCTCCC TCCCAGGTTC 

30 ^r'"" '"""'^'^^ "^^^^^^-^ ^CCACCACAC 

CTGGCTAATC TTCGTATTTT TAGCAGAGAT GAGGTTTTAC CATGTTGGTC GGGCTGGXCG 
35 TGAACTCCTG ACCTGGTGAT CTGCCCACCT CAGCCTCCCA AAGTACTGGG ATGACAGGCA 
TGAACCACTG TGCCTGGCCA TCTTTTTTAT TTTTTAAAGA GATGGGTTCT GCTAAGTTGC 

40 

CCAGGCTGGA CCTGAACTCT TGGGCTCAAG TAATCTTCTC ACCTAGTCTC CTGGGTAGCT 
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5 GCAACCAAAG GCACCCGGTT TATCTGCATT CTCTTTTTTT TCTTTGAGAC TGAGTCTTGC 
4440 

TCTGTAGCCC AGGCTGGAGC GCAGTGGCGT GATCTCGGCT CACTGCAACC TCCGTCTTCA 
4500 

10 

GGGTTCAAGC AATTCTCCTG CCTCAGCCTC TGGAGTGGCT GGGACTACAG GCGTGTGCCA 
4560 

CCAGAGCGAG TTAATTTTTT TTTTTTTTTG TATTTTTAGT GGACACTGGG TTTCACTATA 
15 4620 

TTGGCCAGGC TGGTCTTGGA CTCCTGACCT CAAGTGATCC GCCTGCCTTG GCCTCCCAAA 
4680 

20 GTGCTGGGAT TACAGGCACA GGCGTGAGCC ACTACACCTG GCCTATCTGC ATTCTCTTAA 
4740 

TAGTTTCTTA GAAATGGATT CTTAGGAGTA GGATTACAGA GTCAAGAGAC ACAAGTTTTG 
4800 

25 

TAGGCTGGGT GCGGTGGCTC ACGTCTGTGC CTGTAATCCC AGTACTTTAG GAGGCCAAGG 
4860 

TGGGCAGATT CATTGAGCTC AGGAATTCGA GACCAGCCTG GGCAACATGG CAAAACCCCA 
30 4920 

TCTCTAAAGA AATACAAAAA TTAGCCAGGT GTGGTGGTGT GTGCCTGTAG TCCTAGCTAC 
4980 

35 TTAGGAGGCT GGGGTGGGAG GATCAATTGA GCCCAGGAGG TTGAGACTGC AGTGAGCTGT 
5040 

GATTGCACCA TGGCACTCCA GCCTGGGCCT CAAAGTGAGA TCCTGTCTCC AAAACAAAAA 
5100 

40 

AGATACAAGT ATCCTTAAGG CTCCTGCTAC ACATGGCCAG GAAGGTAGTC TATTGGACAG 
5160 
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5 TTTTAAGGTC ATTATCAATA TTAGCTCATT 
5220 

TCTGCTACCA TAGTTGTCAT ATTTTTGATG 
5280 

10 

ATCTGAACCC CATCTGGACA GATTAGCTCC 
5340 

CTGCCCACAC TGAGTTGTTC CTTCCTGGCA 
15 5400 

ACAGCTCCGA GGGACCTCCC TGTCCTTTTC 
5460 

20 AGGGCTCAGA CCTGCCCTGC CTGCTCTGTG 
5520 

GAACAAGTTG GTCCCTCCTC CCCACCCCAG 
5580 

25 

CTGCATAGGA GCAGCTCACC CTGCCTCCTC 
5640 

TCCTTCCATC CCCTGCCTGG CTGCCTGGCT 
30 5700 

TTTCTTCCCT CCTAAAACAC CACCCACTGT 
5760 

35 TTTCTTTTTT TTTTTTGAGA GGGAGCCTCA 
5820 

TGATCTCCAC TCACTGCAAC CTCCGCCTCC 
5880 

40 

CCTGAGTAGC TGGGATTACA GGCGCCTGCC 
5940 
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TAATTCCCTC CAAAACTCTG TAAAGCACAT 

GGGGAATCTA CAGTGAGAGG CAGTGCTGGG 
AGGGCCCATG CTCTTGACTG GCTGGCCGCG 
GGGTAGGTGT GCCTATCTCA GGGACACTAG 
CTTTGTGAAC TGTGTCACGT TCTCCAGAGC 
CAGATGCCCT TGGCCAAGGT TTTCACACTG 
CCTGTCCTTG GCCCTCCTCC AGGTCTCCTT 
CAGAGTCCTG CCCTAGAAGC GCAATCCCTC 
CCTTCCCTCA GCCTCCAAGA CATGCTCAGT 
CTCATTTCCA TTCATTTCTT TCTTTCTTTC 
CTCTGTCACC CAGGCTGAAG TGCAGTGGCA 
CAGGTTCAAG CAATTCTCCT GCCTCAGCCT 
ACGATGCCCG GCTAACTTTT GTATTTTTAG 
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5 TAGAGACGGG GTTTCGCCAT GTTGGCCAGG 
6000 

TGCCTGCCTC AGCTTCCCAA AGTGCTGGGA 
6060 

10 

TCATTTCTCA GTCCTTTGAA TCTACTTGCC 
6120 

AACCTTCCCC CTTAAACCTG CGGGTTTGGC 
15 6180 

CTGACCCAGG TACCCCTCCA GCCTCAGCTC 
6240 

20 GCTGCTTCTG CCCCCTCTTC TGGAGCCCCA 
6300 

CTTCTTCTCC TCCTGCTCTG TGGTGGCCTC 
6360 

25 

GAGTGTTTCA ACCCTCACTG CTCCCTGAAG 
6420 

AGGAGGCACT GTGATAAAGG GGCTCTTCAG 
30 64 80 

CCCCCGCGGC CTTCCACCCT TCACCGTCCA 
6540 

35 TCCTCACAGG CGTCGGGGCC CCAGGCAGTG 
6600 

CCAGCTGCCA GGGACCTGGT GAGCAAAGAG 
6660 

40 

GAGATTCGGC GCACGGCCCA GGCAGCGGCC 
6720 



CTGGTCTCGA GCTCCTGACC TCAGGCAATC 



TTACAGGTGT GAGCCACCGC GCCCACCCAT 



CCTCCATCCC GCCATGCCAC CTACCCTAAC 



CGGGCGCAGT ACACTGAGTC AGTACTGGTA 



CAGTCAGATG GGACAGCCTG CTGGTCCCTG 



GCCCTGGAGG CTCCATGTGG CTCAGCAGAA 



TTGAGGGCAG CACTCACCTT GGAAAGCATG 



GACCAAGGTG TCCCATTTTA CAGTCGGGGG 



ACCCACGTCT GAGAGAGCCA GGCTGCGCCG 



GCCAGGGCCA CTGCCATCAC CGCCTGCTGG 



AGAAGGCGGC TGCTGACTCC TCTTTCCTCC 



GGCTTCCGGC GGGCCCGGCA CGTGGTGGGG 



GCCCTGAGAC GTGGCGACTA CAGAGCCTTT 
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5 GGCCGCCTCA TGGTGGAGAG CCACCGCTCA CTCAGGTGAG GCCCTCTGGG CGCCCCGCTC 
6780 



CTGCCGGGCA CAGGCCGGCC CAGGCCCACC 
6840 

10 

TGAGGTGAGC TGCCCAGAGC TGGACCAGCT 
6900 

TTATGGCAGC CGCATGACGG GCGGTGGCTT 
15 6960 

CTCCGCTGCT CCCCACGCCA TGCGGCACAT 
7020 

20 GCAGGAGCGG CAGCTTCCCG GGGCCCTGCC 
7080 



CCTTCAATAT CCTCTCTGCA GAGACGACTA 



GGTGGAGGCT GCGCTTGCTG TGCCTGGGGT 



CGGTGGCTGC ACGGTGACAC TGCTGGAGGC 



CCAGGTGGGC GGGCACCAGG GCCTGGGCGG 



ACTCACCCCC AGCCCGCCTC TTACAGGAGC 



ACTACGGCGG GACTGCCACC TTCTACCTCT CTCAAGCAGC CGATGGAGCC AAGGTGCTGT 
7140 

25 

GCTTGTGAGG CACCCCCAGG ACAGCACACG GTGAGGGTGC GGGGCCTGCA GGCCAGTCCC 
7200 



ACGGCTCTGT GCCCGGTGCC ATCTTCCATA TCCGGGTGCT CAATAAACTT GTGCCTCCAA 
30 7260 

TGTGGTACCT GCCTCCTCTA GAGGTGGGTG TATGCTTGGG TGTCAGAGAA TGGGGGATGT 
7320 

35 CAGAACCGCT CCCCTACCCT AGGGGAGCAC CTCTCAGGCC CCAGAAGAAT GGGCAAGGCA 
7380 



GGGCCTAGCA GTAGCAAAAC CATTTATTAA 
7440 

CTCCCAGCTC TTTGGTTACA AATAGGTTTG 
7500 



GTGCAGAACA AAGGCTGGGT CCTTGTGCTG 



GGCCCACAGA GGACGGACCT TGCCCCCTTC 
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5 ATGCCTCCCA GGAGACACCT AGCCCCTGCT CTGTGCATGC GGGTGGGCTG GGCCCCCAGG 
7560 



10 



15 



GGTGCAAGGA TGGAGTAGCT GAGGAGGCTC CGGGAGAGGA GTCGGGAGGA CGCCTAGTGG 
7620 

GACATTGCGG GGGTGGCGCA GGGTGCGGTC AAGTTTGGAA GAAACTGTTG GGTCCA 
7676 

(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
AGCCTTCCGG GAGGAGTTCG G 

30 21 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA (genomic) 
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5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

CTGGTTGTAG TCCGTGTGTT C 
21 

10 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
^5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: DNA (genomic) 



(xl) SEQUENCE DESCRIPTION: SEQ ID N0:1 

25 GCCAGCAGCT CCGCGACCTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: DNA (genomic) 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

GCTTCCTCCC TTCCAACGTG G 
21 
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5 



15 



20 



25 



35 



(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

CCCAGGCTCC AGCGAGCGCT G 
21 

<2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 



ACCTCTGAGG GTGCCGATGA G 

40 21 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

CCCACAGCTC AGGGCAGAGT C 
21 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GGACACTTCT AATTGTCCTC C 
21 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



10 



15 



25 



30 



35 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 

GATGAACTGG TCCATGATGC C 
21 

(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 17 

AGGGGCACTG AGCTGACCAC C 
21 

(2) INFORMATION FOR SEQ ID NO: 18: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 18 

CACTTCTACA CATTGGCGCC G 
10 21 

(2) INFORMATION FOR SEQ ID NO: 19: 



15 



20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic ajcld 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic, 



25 



35 



40 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:19 



CTTCGCAGGG ATGCCCTGTG G 
21 

30 (2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
(A, LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii, MOLECULE TYPE: DNA (genomic) 



(xi, SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
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5 TCATCACCAA CTCTAATGTC C 
21 

(2) INFORMATION FOR SEQ ID NO: 21: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



15 



20 



25 



35 



40 



(ii) MOLECULE TYPE: DNA- (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

TGTCAGCAGT GCCTATCTCT G 
21 

(2) INFORMATION FOR SEQ ID NO: 22: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 

AGCAGCGGAG GCCTCCAGCA G 
21 

(2) INFORMATION FOR SEQ ID NO: 23: 
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<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCTCACCGTG TGCTGTCCTG G 

20 21 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 
2^ (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

^0 ^ii) MOLECULE TYPE: DNA (genomic) 



5 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

GGCTGCGCTT GCTGTGCCTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
, , (A) LENGTH: 21 base pairs 
(B) TYPE: nucleic acid 
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10 



25 



30 



35 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 

15 CCTCACCGTG TGCTGTCCTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 26: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 

CCTCACCGTG TGCTGTCCTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 27: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 
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(xl) SEQUENCE DESCRIPTION: SEQ ID NO:27 

GCGGGACTGC CACCTTCTAC C 
21 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



20 



25 



35 



40 



(ii) MOLECULE TYPE: DNA (genomic 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
CTCAATAAAC TTGTGCCTCC A 

30 21 

(2) INFORMATION FOR SEQ ID NO:2 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CGGATATGGA AGATGGCACC GGG 
23 

10 (2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 22 base pairs 
<B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear, 

(ii) MOLECULE TYPE: DNA (genomic) 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 

25 AGAGCTGCAG GCGCGCGTCA TG 
22 

(2) INFORMATION FOR SEQ ID NO: 31: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 

(ii) MOLECULE TYPE: DNA (genomic) 



40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 

CCGAGGATCC CGCGCCGAC 
19 
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5 



10 



15 



20 



(2) INFORMATION FOR SEQ ID NO: 32: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 



CAGCTGCCCG CCCCACATCT 
20 
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5 

WHAT IS CLAIMED IS: 

1. An isolated nucleic acid molecule encoding human genomic galactokdnase. 
said nucleic acid molecule selected from the group consisting of: 
10 (a) a nucleic acid molecule comprising the sequence as set forth in SEQ ID 

N0:7; and 

(b) a nucleic acid molecule differing from the nucleic acid molecule of (a) in 
codon sequence due to the degeneracy of the generic code. 

15 2. A vector comprising the nucleic acid molecule of claim 1, 

3. A recombinant host cell comprising the vector of claim 2. 

4. An isolated nucleic acid molecule comprising a DNA sequence that encodes 
20 nucleotides 29 to 1204 of SEQ ID NO:5 or nucleotides 29 to 265 of SEQ ID NO:6. 

5. A vector comprising the nucleic acid molecule of claim 4. 

6. The vector according to claim 5 which is a plasmid. 

25 

7. A recombinant host cell comprising the vector of claim 5. 



8. A process for preparing a human galactokinase protein comprising 
culturing the recombinant host cell of claim 7 under conditions promoting expression 

30 of said protein and recovery thereof. 

9. An isolated protein encoded by the DNA sequence of claim 4. 

10. A monoclonal antibody that is specifically reactive with the protein of 

35 claim 9. 

11. A method for diagnosing conditions associated with human galactokinase 
deficiency which comprises isolating a serum or tissue sample from an individual; 
allowing such sample to come in contact with an antibody or antibody fragment 
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5 Which specifically binds to the human galactokinase protein of claim 9 under 

conditions such that an antigen-antibody complex is formed between said antibody 
or antibody fragment and said galactokinase protein; and detecting the presence or 
absence of said complex. 



10 



12. A method for diagnosing conditions associated with human galactokinase 
deficiency which comprises isolating a nucleic acid sample from an individual; assaying 
said sample and the DNA sequence, or corresponding RNA sequence, that encodes a 
human galactokinase gene; and comparing differences between said sample and said 
DNA (or RNA) that encodes nucleotides 29 to 1204 of SEQ ID NO:4, wherein said 

15 differences indicate mutations in the human galactokinase gene. 

13. The method of claim 12 wherein said sample is RNA which is 
subsequently amplified by PCR-RT. 



20 



25 



30 



35 



14. The method of claim 1 3 wherein assaying said sample comprises a 
restriction endonuclease digestion. 

15. The method of claim 14 wherein said restriction endonuclease is Msc I. 

16. The method of claim 12 wherein assaying said sample comprises a 
hybridization assay. 

17. The method of claim 16 wherein the hybridization assay is heteroduplex 
electrophoresis which comprises determining differential mobility of heteroduplex 
products in polyacrylamide gels, said heteroduplex products are the result of 
hybridization between the nucleic acid sample and the DNA sequence, or 
corresponding RNA sequence, that encodes nucleotides 29 to 1204 of SEQ ID NO:4. 

18. The method of claim 12 wherein assaying said sample comprises gel 
electrophoresis of restriction fragment length polymorphisms of said nucleic acid 
sample and the DNA sequence, or corresponding RNA sequence, that encodes 
nucleotides 29 to 1204 of SEQ ID NO:4. 
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5 19. The method of claim 12 wherein assaying said sample comprises DNA 

sequencing. 



20. A method for diagnosing conditions associated with hunnan galactokinase 
deficiency which comprises isoladng cells from an individual containing genomic DNA 

10 and assaying said sample by in situ hybridization using the DNA sequence that 

encodes nucleoddes 29 to 1204 of SEQ ID NO:4, nucleotides 29 to 1204 of SEQ ID 
NO:5, or nucleotides 29 to 265 of SEQ ID N0:6; or a fragment that encodes at least 
one exon of said sequence; or a fragment containing at least 15 contiguous base pairs 
of said sequence as a probe. 

15 

21. A transgenic non-human mammal capable of expresing in any cell thereof 
the DNA of claim 4. 
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FIGURE 2(a) 

5 • 

CCGAGCATCCCGCGCCCACGGGTCTGTGCCGGAGCAGCTGTGCAGACCTGCAGGCGCGCG - 3 
TCATGGCTGCTTTGAGACACCCCCAGGTCCCGGAGCTGCTGGCCGAGGCCCGGCGAGCCT 5 8 
WAALRQPQVAELLAEARRA 

TCCGCCAGGAGTTCGGGGCCGAGCCCGAGCTGGCCGTGTCACCGCCGGGCCGCCTCAACC 1 1 8 
FREEFCAEPELAVSAPGRVN 

TCATCGGGGAACACACGGACTACAACCAGGCCCTGGTCCTGCCTATGCTGACGGGCTCCA 1 7 8 
L IGEHTDYNOGLVLPM 

CGGGG AGCCCCTAGCCCGCCGCCGCCTGTCCCGGTCGCCGAGGAGGCCGGGCCTCGGGGA 2 38 

CGCTGGGGGCGAGTTCTTCCCGCGGGAGATGTGGGGCGGGCAGCTGCCCCTCGAGCACCG 29 8 

GTGCACGGAAGAGTCCCCGGGACAGGCTGTTCCCCACGTTGGAAGGGAGGAAGCGAAGAA 3S8 

GTGGTCCCCAGAGGGTGCGCGGCCGCCTCTTGGCTCAAGCCCGCCCTCTGGGGGCTGGGG 4 1 8 

CTCCTCGCCTTCAACCTGGGAGCATGTTCCCCTTAAACTGTGAGGCCCTGTCTGCCACGC 4 7 8 

AGAAGGGGACACTCCGCGCCTCCGGCCACCGTGGGGCCCCAACCGCAGACCTGGGCGAAC 5 38 

GTAGCCTTCTGGCCCAGCCCGTTCAATTTACAGAGGAGGAAACTGAGGCCTACAGAGGCC 598 

CAGTGAACTGCTGGAGCTCACACAGCAGGTTCTTGGCGGGGCTGCGACTTGGGAGTGAGG 6 5 8 

ACTCCCAGCTTTCAGCGGGGGGCGCTTTCCGCCCCATCTGCAGCTTGGGGAGTGCACAGG 7 1 8 

TACAGGATGTCCAGAGCCACCCAAAATGTAAAGGCTTTGGAGCTCCAGTGATCTGTTTTC 7 7 8 

CCrrTTGGGCTAAGCTCTCCCCCCTTGCCCCACAGCTCAGGGCAGAGTCCAGGTCTGTGCT 8 3 8 

CCAGCTGCAGCCGCCCCGCCCCTGAAGACCTAAGGGGGCAGGGCTCAAGCCCCCAAGGTC 898 

AGCTGGCCCTCAGGATCTTCCCTGCG ACGCTG AACCTGG AGGTTC AG AACCTG ATG ACTG 9 58 

TGGAGGCATCAGAACCTCGGCTGGAGGCAGTGTCATTGGAGAGGCTTACTCCAGCTGGCG 1018 

GAAGCCTCACGTACTGCTrGTCTCTCCTGCCAGGCTCTGGAGCTCATGACGGTGCTGGTG 107 8 

ALELMTVLV 

GGCAGCCCCCGCAAGCATGGGCTGGTGTCTCTCCTCACCACCTCTGAGGGTGCCGATGAG 113 8 
GSPRKDGLVSLLTTSEGADE 

CCCCAGCGGCTGCAGTTTCCACTGCCCACAGCCCAGCGCTCGCTGGAGCCTGGGACTCCT 119 8 
PQRLQFPLPTAQRSLEPGTP 

CGGTGGGCCAACTATGTCAAGGGAGTGATTCAGTACTACCCAGGTATGGGGCCCAGGCCT 12 5 8 
RWANYVKGVIQYYP 

GAGCCAAGTCCTCACTGATACTAGGAGTGCCACCTCACAGCCACAGAGCCCATTCATTTG 1318 
TCTGATACACTGTGGGG AAGGCTTGTAGAGTGGAGCATCCCATTGTACAGATGAGGAAAC 13 7 8 
TGATGCCCCCAGAAGGTCGGG AACTTGCCCTGGGTTTCCCGTG ACCTGATTGC AGG AGCC 14 3 8 
AGGATTTGAACCCCAGCCTTTTTTCCCTCCAGAGCCCTAAACCAGG AGG AC AATTAG AAG 14 89 
TGTCCCAGCAACCTCAG AGGGTGGG AAAATGGAGGGG AGTGGGTCCCTTGGGCCAGCAGG 15 58 
TTGGTGGGGTTCTTGACAATTGAGACACACACCTAGAAACAGTTGCTAGGCCGTTCCTCC 1618 
CCTTCCCGCCAGGACACCTGCCCTTCCTGTCCAATCCTCCCAGGCAGCCTCTCTTACCAT 16 7 8 
CACCTGTTCTTTCCCCCTGCAGCTGCCCCCCTCCCTGGCTTCAGTGCAGTGGTGGTCAGC 17 3 8 

AAPLPGFSAVVVS 

TCAGTGCCCCTGGGGGGTGGCCTGTCCAGCTCAGCATCCTTGGAAGTGGCCACGTACACC 17 9 8 
SVPLGGGLS SSASLEVATYT 

TTCCTCCAGCAGCTCTGTCCAGGTACCAGCTAGGCCCCAGCCCTGACCCAGCCCTCCTTC 185 8 
F L Q Q L C P 

CCTGAGGTCTCCAGGTGGTCCCAGCTTCTACTATGCCTTATGGAGGGGGTGGCAGGGAAT 1918 

CTCCCTGGAGTGTCATTGAAGCCACTGCTGCTTCC ACCAGCCCTAGCCTCCCCACCTCAC 197 8 

CCTGTACTGCAGACTCGGGCACAATAGCTGCCCGCGCCCAGGTGTGTCAGCACGCCGAGC 2 03 8 
PSGTIAARAQVCQOAE 

ACAGCTTCGCAGGCATGCCCTCTGGCATCATGGACCAGTTCATCTCACTTATGGGACAGA 2 098 
HSFAGMPCGIMDQFISLMGQ 

AAGGCCACGCGCTGCTCATTGACTGCAGGTTGGGCTCGCTCCCCIXC'rCCCCTCCCGCCC 215 8 
KGHALLIDCR 
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FIGURE 2(b) 

■rcCACTCAGCACCTCC'ltjfX.rtJfJAGTCTGCCCACTCcrTrrrr-r-.^^ 
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A 1 ^ rcGGCGTGGCTCrrT'rTTTTTrrTriTc™n-7^T^ « ' t-eCATGAAAGCCAT 3 4 7 6 
CTGTTGCCCAGGCTGGAGTGCACcSSSJfSSJ^^ , 

CACGCCCAGCCAATrm-AT^TTT^^IlTf J^^^J^^^^'^^'^^^ 3 658 

CTCGAACTCCTCACCTCcSSjJSJSc??c^^^ 3,"fl 
TGTGAGCCACTGCACCTCGCCTCCGTrTr^S^^i^ '^'^^^^'^'^^^^ 3 77 8 

jsSg^s^^^^^^ iir 



8 
8 
8 
8 
8 

4 198 
4258 



ACAGCTCCGAGGGACC-TCCr TCrnr-;;^2^^^'='''='^C'^'^ATCTCAGGGACACTAG 531 8 
AGGGOrCAGACCTCCCT;i-GCCTGCTCT^^^ J^^™ ^■^'^^^'^'^CA'^AGC 53 98 
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FIGUPE 2(c) 

CTGC; ATAi ;( ;/M U ACf.-fr ACCCTGCCTCCTCCACAGTCCTGCCCTAGAACCGCAATCCCTC 5 57 3 

TCCnx:CATC<::f 'r-\\ U •< :TGCCTGCCrrGGCTCClTCCCTCAGCCTCC AACACATCCTCAGT 5 6 3 9 

TTTCTTCCCTCCTAAAACACCACCCACTGTCTCATTTCCATTCATTTCTTTC^^ 5 69 8 

mx:TnTTnTn"rTt ^ AGAGGCAGCCTCACTCrGTCACCCACGCTG AACTGCACTCGCA 5 7 5 fl 

TGATCTCr-AClX-AC1x;r;AACCTCCGCCTCCCAGCTTCAAGCAATTC1'CCTCrc-TCAGCXT 58 13 

CCIXIACITACCTGCCATI-ACAGGCGCCTGCCAf •GATGCCCGGCTAACITTTGTATTTTTAG 587 3 

TAGAGACCGGC'rTTC(:(;CATGTTGGCCAGGCT(:GTCTCGAGCTCCTGACCTCAGGCAATC 5 9 3 6 

TGCCTGanxrACCI-rtrCAAAGTGCTGGGATrACAGGTGTGAGCCACCGCGCCCACCCAT 5998 

TCATTTCTCAGTCCTT-rG AATCTACTTGCCCCTCr ATCCCGCCATGCC ACCTACCCTAAC 6058 

AACCTTCCCCCTI'AAACCTGCCGGTTTGGCCGGGCGCACTACACTGAGTCAGTACTGGTA 6118 

CTGACCCAGGTACCCnCCAGCCTCAGCTCCAGTCAGATGGGACAGCCTCCTGGTCCCTG 6 17 3 

GCTGCTTCTGCCCCLI'CTTCTGGAGCCCCAGCCCTCGAGGCTCCATGTGGCTCAGCAG AA 6 2 3 8 

CTTCTn'CTCCTCCTGCTCTGTGGTGGCCTC'rTGAGGGCAGCACTCACCTTGGAAACCATG 6 29 8 

GAGTGTTTCAACCCTCACTGCTCCCTGAAGGACCAAGGTGTCCCATTTTACACTCGGGGG 6 358 

AGGAGGCACTGTGATAAAGGGGCTCTTCAGACCCACGTCTGAGAGAGCCAGGCTGCGCCG 6 4 18 

CCCCCGCGGCCTIXrCACCCTTCACCGTCCAGCCAGGGCCACTGCCATCACCGCCTGCTGG 6 4 7 8 

TCCTCACAGGCGTCGGGGCCCCAGGCAGTG AG AAGGCGGCTGCTG ACTCCTCTTTCCTCC 6 53 8 

CCAGCTGCCAGGGACCTGGTGAGCAAAGAGGGCTTCCGGCGGCCCCGGCACGTGGTGGGG 6 598 
AARDLVSKEGFRRARHVVG 

GAGATTCGGCGCACCGCCCAGCCAGCGCCCGCCCTCAGACGTGGCGACTACACAGCCTTT 6 65 8 
E IRRTAQAAAALRRGDYRAF 

GGCCGCCTCATGGTGGAGAGCCACCGCTCACTCAGGTG AGGCCCTCTGGGCGCCCCGCTC 6718 
GRLMVESH RSLR 

CTGCCGGGCACAGGCCCGCCCAGGCCCACCCCTTCAATATCCTCTCTGCAGAGACGACTA 6 77 8 

D D Y 

TGAGGTGAGCTGCCCAGAGCTGGACCAGCTGGTGGAGGCTGCGCTTGCTGTGCCTGGGGT 6 8 3 8 
EVSCPELDQLVEAALAVPGV 

TTATGGCAGCCGCATGACGGGCGGTGGCTTCGGTGGCTGCACGGTGACACTGCTGGAGGC 6 898 
YGSRMTGGGFGGC TVTLLEA 

CTCCGCTGCTCCCCACCCCATGCGGCACATCCAGGTGGGCGGGCACCAGGGCCTGGGCGG 6 95 8 
SAAPHAMRHIQ 

GC AGGAGCGGCAGCTTCCCGGGGCCCTGCCACTCACCCCCAGCCCGCCTCTTACAGGAGC 7 018 

E 

ACTACGGCGGGACTGCCACCTTCTACCTCTCTCAAGCAGCCGATGGAGCCAAGGTGCTGT 7 07 8 
HYGGTATFYLSQAADGAKVL 

GCTTGTGAGGCACCCCCAGGACAGCACACGGTGAGGGTGCGGGGCCTGCAGGCCAGTCCC 7 13 8 
C L * 

ACGGCTCTGTGCCCGGTCCCATCTTCCATATCCGGGTGCTCMIA^CTTGTCCCTCCAA 719 8 

TGTGGTACCTGCCTCCTCTAGAGGTGGGTGTATGCTTGGGTGTCAGAGAATGGGGGATGT 7 2 5 8 

CAG AACCGCTCCCCTACCCTAGGGGAGCACCTCTCAGGCCCCAGAAGAATGGGCAAGGCA 7 318 

GGGCCTAGCAGTAGCAAAACCATTTATTAAGTGC AG AACAAAGGCTGGGTCCTTGTGCTG 7 37 8 

CTCCCAGCTCTTTGCTTACAAATAGGTTTGGGCCCACAGAGGACGGACCTTGCCCCCTTC 7 4 3 8 

ATCCCTCCCAGG AGACACCTAGCCCCTGCTCTGTCCATGCGGGTGGGCTGGGCCCCCAGG 7 4 9 8 

GGTGCAAGG ATGGACTAGCTGAGG AGGCTCCGGC AG ACG AGTCGGG AGGACGCCTAGTGG 7 55 8 

GACATTCCGGGGGTCGC-GCAGGGTGCGGTCAAGTTrCG AAGAAACTGTTGGGTCCA 7 614 
- ' 3 • 
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